lorenzodifuccia / safaribooks

Download and generate EPUB of your favorite books from O'Reilly Learning (aka Safari Books Online) library.
Do What The F*ck You Want To Public License
4.62k stars 685 forks source link

Fails to download equations correctly #211

Closed soniamehtaml closed 1 year ago

soniamehtaml commented 4 years ago

Found issues with Downloading books bcz it downloads mathematical equations used in the books incorrectly. Therefore, the final mathematical formulas are not readable then

TheSnoozer commented 4 years ago

can you provide a book-id where the issue can be observed?

soniamehtaml commented 4 years ago

can you provide a book-id where the issue can be observed?

You can try it on - 9781098115555 and you will note that all mathematical expressions are distorted, misarranged or broken.

You can try checking some of the Machine Learning books also. The results would be the same. I verified results with epub / XHTML too (same broken expressions)

TheSnoozer commented 4 years ago

Mhh both epub / XHTML looks good on my side. Could also also provide a screenshot or page number? Perhaps could you delete the entire downloaded content (including the epub) and retry again after getting that screenshot?

soniamehtaml commented 4 years ago

Probably you played a bit with correcting equations. They are looking a little bit fine in epub.

However, it does not look like correct copy of the book as epub. Still, there are many issues in graphics, colours, images and I am sharing some sample images.

Error Sample 1

Error Sample 2

Error Sample 3

soniamehtaml commented 4 years ago

I have also checked the output of XHTML files. They are still showing Equation Errors. I tested the whole code on a fresh install with Windows Virtual Machine as well as Linux Virtual Machine but errors are same.

TheSnoozer commented 4 years ago

Interesting. Do you use the latest version?

Sadly on my end I can't reproduce the situation and don't have a display issue within the XHTML-Files and also the epub looks fine. However since you can also observe the issue in the downloaded XHTML files I currently assume that something downloading the CSS-Files went wrong (yes the fancy equation use CSS to be displayed).

Do you have two css files located in OEBPS/Styles-Folder of the book (e.g. Books/Essential Math for Data Science (9781098115555)/OEBPS/Styles?

Can you remove the entire downloaded content again (sorry) and this time run with --preserve-log (e.g. python safaribooks.py 9781098115555 --preserve-log)? Within my log I can see the following:

[03/May/2020 01:46:29] Crawler: found a new CSS at https://learning.oreilly.com/library/css/essential-math-for/9781098115555/epub.css
[03/May/2020 01:46:29] Crawler: found a new CSS at https://learning.oreilly.com/static/CACHE/css/output.8054605313ed.css

Do you see a similar log message here?

soniamehtaml commented 4 years ago

Is it an error due to the new CSS? How to get rid of it?