openzim / python-libzim

Libzim binding for Python: read/write ZIM files in Python
https://pypi.org/project/libzim/
GNU General Public License v3.0
62 stars 20 forks source link

tests/*.zim files not included in release tarball #68

Closed legoktm closed 4 years ago

legoktm commented 4 years ago

If you download https://files.pythonhosted.org/packages/26/8e/201f1ed560f83f5fd4a87fe3ec52960d078f1a31c8f1670ac16cb92fe3a3/libzim-0.0.3.post0.tar.gz it's missing the two zim files in tests/ so the tests fail. I believe we need a MANIFEST.in file for this.

Also, from a licensing standpoint, it would be nice if we could use zim files with fully public domain content so then I don't have to keep track of the specific copyright status for Debian.

legoktm commented 4 years ago

Also if we used different zim files we could make them much smaller...they're 33M right now.

kelson42 commented 4 years ago

@rgaudin @legoktm Maybe we should publish the tarbal on download.openzim.org/release ?

rgaudin commented 4 years ago

I also think we should not include 33M of ZIM files for tests.

I think it's important we test with ZIM files not created with pylibzim but there are many small ones. We could have an longer test that downloads a larger ZIM file also if we think it's relevant. I think there were issues with large number of articles in the very early days of this.

kelson42 commented 4 years ago

I'm in favour of downloading ZIM files if they are more than 100KB.

legoktm commented 4 years ago

With my Debian hat on:

Avoiding large ZIM files in the tarball would be nice too, but the other two are more important.

legoktm commented 4 years ago

Might also be worth looking at what ZIMs kiwix-lib is using: https://github.com/kiwix/kiwix-lib/tree/master/test/data. wikipedia_en_ray_charles_mini_2020-03.zim is ~550k which seems much more reasonable.

legoktm commented 4 years ago

I created #76 as an alternative solution, it just skips those tests if the file is missing.