Closed benoit74 closed 3 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 100.00%. Comparing base (
1eddabc
) to head (c91646f
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Some files like https://irp.fas.org/doddir/milmed/milderm.pdf are raising "MuPDF error: format error: cmsOpenProfileFromMem failed" error. Looks like it could be fixed since it is an ICC profile issue (for which we do not care): https://github.com/pymupdf/PyMuPDF/discussions/3572. I will fix this.
Fix is different than expected, but at least it is working, PR is again ready for review
I did not passed index_content: str | None = None
but index_data: IndexData | None
since it also allows to set the title which is used for suggestions, which is quite important (item title is not used for suggestions when index data is passed)
And I also modified add_item_for
since this is quite heavily used in scrapers.
Other than that, I think the change will please you.
I did not passed
index_content: str | None = None
butindex_data: IndexData | None
since it also allows to set the title which is used for suggestions
I see it's missing from my comment but I meant index_content
and index_title
. I think requiring this extra import is in opposition with what add_item_for
tries to achieve but you're the judge of that.
There are a couple of unresolved discussions…
I see it's missing from my comment but I meant index_content and index_title. I think requiring this extra import is in opposition with what add_item_for tries to achieve but you're the judge of that.
Then I get what you meant, and I agree the extra import is not very lean
I finally decided to keep using index_data
in add_item_for
and StaticItem
because it is a convenient way to force user to pass both title and content should he decide to customize index_data and to detect when this is not done with pyright. Otherwise one might be tempted to pass only an index_title
or only an index_content
and this is not what we want.
Fix #167 Fix #168
Edited description
Changes:
IndexData
to hold indexing data (title, content, keywords) before passing it to libzimindex_data: IndexData | None
andauto_index: bool | None
for customizing indexing inStaticItem
andadd_item_for
:index_data
from calller for customized indexingauto_index
to False to disable indexing (both in python-scraperlib and libzim)Former description and points to discuss
Changes:
IndexingItem
class capable to customize index data from data passed from the scraper or automatically from PDF contentIndexData
class holding the index dataOpen points to discuss:
IndexingItem
class or should we simply embed all this logic inStaticItem
?add_indexing_item_for
, similar toadd_item_for
? Or just enrich theadd_item_for
with new arguments?