openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
126 stars 37 forks source link

Display buttons to the various book formats only for requested formats #163

Closed benoit74 closed 1 year ago

benoit74 commented 1 year ago

Fix #159

Sample command used for tests (for some tests below, popularity of book 18813 has been tweaked in the extracted RDF in rdf-files to be bigger than the popularity of book 18812, so that sorting by popularity/title produces a different result):

python gutenberg2zim -z gut_fr_test.zim -b 18812,18813 -f epub,html

What has been checked while browsing the resulting ZIM:

benoit74 commented 1 year ago

Do you mean that the button is still there even when you request only epub,html? This is not what I get. It is another known issue (#160) that the button for PDF is present on all books when you request epub,html,pdf even if the book does not have a PDF available on PG, but has nothing to do with the UI / ZIM creation stage, the issue is that the scraper forces the presence of a PDF format for all books at the parsing phase.

rgaudin commented 1 year ago

Do you mean that the button is still there even when you request only epub,html? This is not what I get.

Because you're not testing against the correct books. Try with the command I mentioned.

It is another known issue (#160) that the button for PDF is present on all books when you request epub,html,pdf even if the book does not have a PDF available on PG, but has nothing to do with the UI / ZIM creation stage, the issue is that the scraper forces the presence of a PDF format for all books at the parsing phase.

Then what does this PR should be fixing? the ticket you're closing is named “UI does not limit the buttons displayed to only requested formats” and the PR itself is named “Display buttons to the various book formats only for requested formats”. I am seeing a format-button (PDF) for a format I did not request. That's exactly what I understand should be fixed by this PR. Now if you're saying PDF is there because of a different bug, why am I seeing the epub button as well in the book page when requesting only html?

benoit74 commented 1 year ago

I'm very sorry, but I do not see a format-button (PDF) when using the following command:

python gutenberg2zim -z gut_fr_test_withoutpdf.zim -b 18812,18813,16816 -f epub,html

Did you cleared the dl-cache folder? I forgot to mention that cover pages are generated in this folder and not regenerated if you request a different list of formats (yes, it sucks, but I'm not sure about how to fix this).

You are right that PDF button should be there only if requested, and this is what I observe in my tests.

What I mentioned is that when PDF is requested (and only then), the button will be there for all books even if there is no PDF available in reality, and this is the other issue. In other words, I mentioned that for now it is normal to have PDF buttons with broken links for books which do not have a PDF format in PG, this is the other issue. But this should happen only if PDF is requested.

rgaudin commented 1 year ago

Oh I get it now. Thanks for clarifying.

Now, removing the dl-cache didn't help. Neither did removing the DB. What fixed it was removing all files (but files_on_dante) from the tmp/ folder. It's really a terrible experience.

It does work as expected from a clean state ; thanks