openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
17 stars 16 forks source link

Added initial zimwriterfs clone #165

Open rgaudin opened 1 month ago

rgaudin commented 1 month ago

Here's a first shot at an implementation of zimwriterfs using scraperlib.

It uses the same interface except for two missing features:

Very little code here, it's mostly CLI. I did not use make_zim_file() so we can set those creator options that zimwriterfs allows to set…

WARN: dumping this so it's not lost but not sure it's ready (haven't written any test yet)

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (7d49831) to head (a288082).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #165 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 32 32 Lines 1393 1393 Branches 240 240 ========================================= Hits 1393 1393 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

rgaudin commented 1 month ago

There is no real need for this so it can await its tests ; I'd love to complete (or someone else) the missing features and add a workflow that builds an independent static binary as well (just to annoy @kelson42!)

@kelson42 what is the purpose of the --inflateHtml feature?

The way I understand it is that when adding files, all files which extension is considered HTML are decompressed as if those would be zip(zlib)-encoded in-place and if that fails, the raw content is added instead.

I'm curious to know what scenario lead to this ugly thing 🙃