lorenzodifuccia / safaribooks

Download and generate EPUB of your favorite books from O'Reilly Learning (aka Safari Books Online) library.
Do What The F*ck You Want To Public License
4.62k stars 685 forks source link

fix extracting img urls #218

Closed nrenzoni closed 4 years ago

nrenzoni commented 4 years ago

Extracts images from self.book_chapters JSON content in new function extract_image_links(chapters), as opposed to using previous implementation's side-effect mechanism in link_replace(self, link).

This fix is more ideal than trying to work with previous implementation, since previously would need to manipulate image paths with no further information, which is a hacky solution, whereas this fix extracts img URLs directly from web JSON - hence a stable fix.

Also, fix results in better code architecture as well, since now link_replace(self, link) doesn't have any side-effects - this fits inline with principle that each function / unit should only perform 1 functional task.

nrenzoni commented 4 years ago

Now that I looked over code, should probably use a set data structure for imgs variable in extract_image_links (which only contains at most 1 unique element) to prevent possibility of downloading a picture more than once if appears in multiple chapters.