Closed benoit74 closed 1 month ago
Extracting PDF info would be beneficial to many scrapers and should thus ideally be exposed in scraperlib.
I just created the scraperlib issue to implement this. Not going to make it for 3.4.0, not sure when this will be planned.
Content of PDF documents is not indexed for full text search, while on some ZIM it is the "core" of the ZIM.
For instance in fas-military-medicine_en (https://dev.library.kiwix.org/viewer#fas-military-medicine_en_2024-05, or https://dev.library.kiwix.org/#lang=&q=military+medicine), there is only one main page and PDFs documents. Full text search is not usable.