openzim / python-libzim

Libzim binding for Python: read/write ZIM files in Python
https://pypi.org/project/libzim/
GNU General Public License v3.0
62 stars 19 forks source link

Segfault adding a duplicate illustration #141

Closed rgaudin closed 2 years ago

rgaudin commented 2 years ago

I believe it can be considered a bug that we now (libzim 7.2.1) raises a RuntimeError when adding a duplicate of Entry, Metadata or Redirection but not for Illustration. Trying to add another (or the same) illustration for the same size segfaults.

Not sure if this gets fixed in here or at libzim level though.

kelson42 commented 2 years ago

Seems important an urgent to handle quickly IMO. Otherwise it will die quite often.

mgautierfr commented 2 years ago

This is the consequence of https://github.com/openzim/libzim/pull/690/commits/1f27949a80b1f4a9d49c2749fc513b9853f1bcfe We will probably not change this in libzim. python wrapper must catch the exception and handle it correctly (this is a bug in the scapper if it adds twice the "same" entry)

rgaudin commented 2 years ago

Yes, all this is understood. pylibzim should catch it and raise in python-world instead of segfaulting. We have no scraper impacted by that ATM but adding tests for all the other duplicate use cases, I realized Illustration doesn't raise but segfaults instead.

kelson42 commented 2 years ago

@mgautierfr @rgaudin So we should open a new ticket and make a new libzim release?

rgaudin commented 2 years ago

@mgautierfr @rgaudin So we should open a new ticket and make a new libzim release?

No, I believe it's already raising properly on libzim and only pylibzim needs to be adapted