openzim / warc2zim

Command line tool to convert a file in the WARC format to a file in the ZIM format
https://pypi.org/project/warc2zim/
GNU General Public License v3.0
41 stars 5 forks source link

Conversion of SVG to PNG is failing / no proceeding correctly #148

Open benoit74 opened 6 months ago

benoit74 commented 6 months ago

From the 100r.co ZIM, it looks like the conversion of a SVG illustration to PNG is failing or at least not producing a nice result.

See https://dev.library.kiwix.org/#lang=&q=grid, where there is no icon displayed (while other ZIMs created with the same zimit2 docker image are ok).

Recipe configuration: https://farm.openzim.org/recipes/100r.co/config

I don't know which tool I could use (aside from C++ source code I don't master at all) to extract the Illustration from the ZIM manually to check what is inside the ZIM.

benoit74 commented 6 months ago

@mgautierfr could you have a look please, I'm a bit stuck given I don't know how to extract the illustration to at least check this

rgaudin commented 6 months ago

It's the 48x48px transparent PNG that's in scraperlib for dev which is used in warc2zim as fallback

ziminfo.py --base64 --debug ~/Downloads/100r-off-the-grid_en_2024-01.zim
ZIM Info for /Users/reg/Downloads/100r-off-the-grid_en_2024-01.zim
Properties
  - UUID: cb87b66b-3c64-b207-0c7e-96a13e6db7b3
  - Main Entry: mainPage (100r.co/site/off_the_grid.html)
  - New NS scheme: True
  - Multipart: False
  - Has Full-Text Index: True
  - Has Title Index: True v0, v1
  - Checksum: 0178f9844fb975772d05789a47d5c9b7
  - Entry Count: 1662
  - All Entry Count: 1679
  - Article Count: 301
  - Media Count: 1277
  - Illustration sizes: {48}
Metadata:
 - Counter: application/json=45;application/json+protobuf=2;application/x-javascript=2;application/xml=1;font/woff2=2;image/jpeg=1165;image/png=100;image/svg+xml=2;text/css=6;text/html=301;text/javascript=18;text/plain=8;video/mp4=10
 - Creator: -
 - Date: 2024-01-15
 - Description: A knowledge repository to live off the grid
 - Illustration_48x48@1: image/png (135 bytes) binary: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwAQMAAABtzGvEAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAANQTFRFR3BMgvrS0gAAAAF0Uk5TAEDm2GYAAAANSURBVBjTY2AYBdQEAAFQAAGn4toWAAAAAElFTkSuQmCC
 - Language: eng
 - Name: 100r-off-the-grid_en
 - Publisher: Kiwix
 - Scraper: warc2zim 1.5.4
 - Source: https://100r.co/site/off_the_grid.html
 - Tags: _ftindex:yes;_category:other;_sw:yes;preppers
 - Title: Off the Grid
benoit74 commented 6 months ago

I had to read the log three times to find the issue:

[DEBUG] Favicon: http://100r.co/media/interface/logo.svg
[WARNING] Failed to convert or resize favicon: cannot identify image file <_io.BytesIO object at 0x7f614bd514e0>
benoit74 commented 6 months ago

SVGs are simply not supported by Pillow: https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html

@rgaudin: is this intended / well known?

benoit74 commented 6 months ago

I've found https://github.com/openzim/python-scraperlib/issues/113 and https://github.com/openzim/python-scraperlib/issues/80 but I'm not sure this is 100% related

benoit74 commented 6 months ago

Until then, I manually converted the SVG to PNG and pushed it to Zimfarm drive.

benoit74 commented 4 weeks ago

This has to be implemented in https://github.com/openzim/python-scraperlib/issues/113 indeed (hopefully will be done in 3.4.0)