openzim / libzim

Reference implementation of the ZIM specification
https://download.openzim.org/release/libzim/
GNU General Public License v2.0
163 stars 47 forks source link

Better opening of split zim archive. #879

Closed mgautierfr closed 2 months ago

mgautierfr commented 3 months ago

The "api" to open split file is a bit buggy:

However, If directly given path foo.zimaa, it succeed to open the file and so don't instanciate a MultiPart file reader (and so only foo.zimaa is read and reading of the full archive is broken). If foo.zim and foo.zim[a-z][a-z] is present, you cannot read split file (either you pass .zim path and you open not split file, or you pass foo.zimaa is it is broken).

We should allow to pass *.zimaa as valid path.

On top of that, we may also check that the file we have opened has a size corresponding to what is declared in the archive header. It could be used to detect this use case but also detect when the zim file is currently downloaded and so not complete.

MohitMaliFtechiz commented 3 months ago

If foo.zim and foo.zim[a-z][a-z] is present, you cannot read split file (either you pass .zim path and you open not split file, or you pass foo.zimaa is it is broken).

@mgautierfr Yes it is broken, I have tested this scenario. When there is both .zim and .zimaa file in storage and we pass the .zim file to libzim it loads that file normally. When we try to load the .zimaa file it shows the Dirent pointer table outside (or not fully inside) ZIM file. error. According to your point, I deleted the .zim file from my storage and then again I tried to load the .zimaa file it showed the same error as pervious. It seems libzim is instantiating a Single reader instead of a MultiPart file reader for split zim files as well, since libzim shows this error for those zim files that are broken.

kelson42 commented 3 months ago

@mgautierfr This seems mandatory to fix https://github.com/kiwix/kiwix-android/issues/3605. We should create mulestone 9.2.1, as we seem to have a regression around chunk mgmt.