openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
19 stars 16 forks source link

Add arguments to make_zim_file and relax dependencies constraints #147

Closed benoit74 closed 6 months ago

benoit74 commented 6 months ago

Changes

I intend to release this as 3.3.2 as soon as it is merged since we want to proceed with TED upgrade to benefit from new encoder presets.

codecov[bot] commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (b479038) to head (d18275c).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #147 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 32 32 Lines 1399 1399 Branches 240 240 ========================================= Hits 1399 1399 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

benoit74 commented 6 months ago

Nota: this has already been tested locally to work fine with ted scraper

benoit74 commented 6 months ago

I was just filling a gap, since it was intentional I will remove this right now.

benoit74 commented 6 months ago

Change done and first comment updated to reflect the new PR content.

benoit74 commented 6 months ago

Arf, I added two args to fill the gap and you made two remarks, so I misread them as being linked to these two args. I did not realized your first remark was about the workaround_nocancel which was already there before this PR. Let's discuss it live to be sure we are aligned

benoit74 commented 6 months ago

After discussion:

After some thought, I think that we have to expose ignore_duplicates, someone very cautious in his scraper could run a deduplication on the files before calling make_zim_file, add needed redirects / aliases after that, and expect the libzim to stop if it detects a duplicate.

rgaudin commented 6 months ago

👍 ; as long as it defaults to True :)

rgaudin commented 6 months ago

But someone cautious would not be using the zimwriterfs mode 😅

benoit74 commented 6 months ago

Does your 👍 means you've reviewed this PR and I can merge or that you've approved my proposition?

rgaudin commented 6 months ago

Yes