openzim / mindtouch

libretexts.org to ZIM scraper
GNU General Public License v3.0
0 stars 0 forks source link

Apply proper CSS for proper page display - step 1 #29

Closed benoit74 closed 1 week ago

benoit74 commented 2 weeks ago

This is part of #8

This first step takes care of CSS stylesheets which are in an external file (two indeed, one for screen and one for print).

It handles fetching all assets (images, fonts) referenced in the CSS.

It handles rewriting of the CSS to fix URLs.

It does not consider inline CSS which is needed and will be handled in a step 2.

codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 68.51852% with 34 lines in your changes missing coverage. Please review.

Project coverage is 51.48%. Comparing base (733c35a) to head (4749161). Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
scraper/src/libretexts2zim/processor.py 8.69% 21 Missing :warning:
scraper/src/libretexts2zim/client.py 35.71% 9 Missing :warning:
scraper/src/libretexts2zim/css.py 93.22% 2 Missing and 2 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #29 +/- ## ========================================== + Coverage 47.22% 51.48% +4.25% ========================================== Files 7 9 +2 Lines 432 540 +108 Branches 45 61 +16 ========================================== + Hits 204 278 +74 - Misses 226 258 +32 - Partials 2 4 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

benoit74 commented 1 week ago

How much of the CSS Processor is based on warc2zim? Once this has matured enough, I think it should move to scraperlib.

This is my plan indeed. So far it is a bit of copy-and-paste indeed, but with modifications due to the specificities of libretexts. The same will happen with HTML rewriting which will be needed for images, videos, links, ... Both features are indeed not specific to warc2zim at all and primitives should definitely be shared in python-scraperlib, I'm sure we will need them again in other scrapers.