Open benoit74 opened 2 weeks ago
Content of PDF documents is not indexed for full text search, while on some ZIM it is the "core" of the ZIM.
Extracting PDF info would be beneficial to many scrapers and should thus ideally be exposed in scraperlib.
See e.g. https://github.com/openzim/warc2zim/issues/289
Content of PDF documents is not indexed for full text search, while on some ZIM it is the "core" of the ZIM.
Extracting PDF info would be beneficial to many scrapers and should thus ideally be exposed in scraperlib.
See e.g. https://github.com/openzim/warc2zim/issues/289