openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
19 stars 16 forks source link

SVG images do not display on kiwix-serve/chrome extension #42

Closed satyamtg closed 4 years ago

satyamtg commented 4 years ago

In the zim created by this run, SVGs do not display correctly (other image formats do display correctly). However, the paths are good and files are present. Also, testing the same on Kiwix Android, the SVGs display flawlessly. Here's a screenshot of what it looks like on kiwix-serve - Screenshot from 2020-08-07 15-18-17

On the previous run, i.e. this one, SVGs used to display properly. However, that one used the zimwriterfs. The newer one uses pylibzim.

rgaudin commented 4 years ago

The difference that could make is regarding the MIME type…

satyamtg commented 4 years ago

The difference that could make is regarding the MIME type…

Indeed. Looking at the content-type header in the previous ZIM (using zimwriterfs), and this one (using pylibzim), we can clearly see that previously it was image/svg+xml whereas now it's image/svg (which seems to be invalid). This might be an upstream issue.

satyamtg commented 4 years ago

After looking into this, I have found the root cause of the problem. Its not an error with this scraper but has something to do with docker. Actually, in the docker container, we run on a debian image and in it, magic identification actually reports image/svg, and that's inherent due to a different magic.mgc database on different platforms.

For fixing this, I think we can go one of the following ways -

I would suggest we transfer this issue to scraperlib and fix there (as mentioned above). I also tried a pure python libmagic implementation, called puremagic but its results seem to be out of this world (at least for svg images).

BTW, @rgaudin do you know how does zimwriterfs deal with it?

rgaudin commented 4 years ago

We already have such a mapping I believe so it would be just adding an entry. That sounds like the best way to go.

On Sat, Aug 8, 2020 at 15:13 Satyam Kumar notifications@github.com wrote:

After looking into this, I have found the root cause of the problem. Its not an error with this scraper but has something to do with docker. Actually, in the docker container, we run on a debian image and in it, magic identification actually reports image/svg, and that's inherent due to a different magic.mgc database on different platforms.

For fixing this, I think we can go one of the following ways -

  • Fix this in scraperlib and convert image/svg to image/svg+xml (we can create a mapping for such mimetypes, if something else pops up in the future, we just put it in the mapping)

  • Fix this in docker (Tried updating and installing things, but out of luck. I think the only option here is to build magic database from source, and it may have its own issues)

I would suggest we transfer this issue to scraperlib and fix there (as mentioned above). I also tried a pure python libmagic implementation, called puremagic but its results seem to be out of this world (at least for svg images).

BTW, @rgaudin https://github.com/rgaudin do you know how does zimwriterfs deal with it?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/openzim/openedx/issues/101#issuecomment-670927769, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAOESPTBTZNZ5E3I2RFCFTR7VFQJANCNFSM4PXQSYQQ .