Open benoit74 opened 1 month ago
Suggestions are based on the ZIM entry's title so that an easier task than full-text indexing.
It's easy to read PDF metadata via third party lib so if a Title is set, we could use that and default to filename otherwise.
Does it means you consider that all PDFs should be added to suggestions? (still not sure on my side, but I can't find an example where I do not want a PDF to be added to the suggestions if we have a proper title)
I have no strong opinion
Just created scraperlib issue since this should be implemented there. Not going to make it for 3.4.0.
And I think we should add the new CLI argument, better to include it now rather than being blocked on some ZIM creation due to whatever problem this might cause.
Content of PDF documents is not indexed for suggestions, while on some ZIM it is the "core" of the ZIM.
For instance in fas-military-medicine_en (https://dev.library.kiwix.org/viewer#fas-military-medicine_en_2024-05, or https://dev.library.kiwix.org/#lang=&q=military+medicine), there is only one main page and PDFs documents. Suggestion is not usable.
Not sure how to tackle this need, but clearly it is a bit sad to not have PDFs on suggestion lists for such ZIMs. Probably not true for all ZIMs, so maybe a CLI option to add?