openzim / python-libzim

Libzim binding for Python: read/write ZIM files in Python
https://pypi.org/project/libzim/
GNU General Public License v3.0
62 stars 19 forks source link

Incorrect getFilesize() result #137

Closed rgaudin closed 2 years ago

rgaudin commented 2 years ago

First iFixIt ZIM file is 2.22GiB large.

When retrieving libzim-returned filesize via (getFileSize()) I get -1910077255 which is negative 🤨 and even it's absolute value is incorrect: 1910077255b is 1.78 GiB.

zim_path = pathlib.Path("ifixit_fr_all_2022-04.zim")
Archive(zim_path).filesize
# -1910077255
zim_path.stat().st_size
# 2384890041

Note: I am using python-libzim to access this information so the possibility of a wrapper bug exists although it looks straigghtforward

veloman-yunkan commented 2 years ago

This is due to 32-bit overflow: 1910077255 == 2**32 - 2384890041

The return type of zim::Archive::getFilesize() is zim::size_type which is a 64-bit unsigned integer:

https://github.com/openzim/libzim/blob/01081e0e0bc478947028f5f00c159c2475b96f7c/include/zim/archive.h#L118

https://github.com/openzim/libzim/blob/01081e0e0bc478947028f5f00c159c2475b96f7c/include/zim/zim.h#L51

Thus the narrowing of the integer type occurs in libzim.pyx

mgautierfr commented 2 years ago

This is indeed in python wrapper. getFileSize (https://github.com/openzim/python-libzim/blob/master/libzim/zim.pxd#L134) should be declared as uint64_t getFilesize() except +