kiwix / kiwix-tools

Command line Kiwix tools: kiwix-serve, kiwix-manage, ...
https://download.kiwix.org/release/kiwix-tools/
GNU General Public License v3.0
465 stars 87 forks source link

kiwix-serve needs a way to tell downloader where they are allowed to cut the files #287

Closed kelson42 closed 2 years ago

kelson42 commented 5 years ago

Not all filesystems are able to deal with our big ZIM files. This is why the openZIM format allows to cut the files everywhere and the libzim can deal with it. Unfortunately, we have introduced content in the ZIM files which can not be splitted, because they are accessed directly by non-zim readers. This is the case for example of the Xapian fulltext search engine but also of video readers. Therefore, we can not cut the ZIM files at any place any more. We need to keep a few clusters integrity (and per default all of them uncut).

This is the problem for the Kiwix integrated downloaders dealing with library.kiwix.org because they have a double constraint: (1) do not download big files (2) do not cut anywhere... and they have no clue where they are allowed to cut the file (= download in ZIM file chunks).

For that reason, we should offer a way in library.kiwix.org (kiwix-serve) to know where they could cut the file. I propose that kiwix-serve delivers a list of ZIM file offsets where the file can be cut. The granularity would be minimum 1GB and if there is a cluster bigger than one GB then there would be a bigger chunk. The downloader, by reading the list, would be able to see what is the biggest chunk and then act accordingly (maybe display an error that this file can not be downloaded at all).

kelson42 commented 5 years ago

@mgautierfr @mhutti1 @macgills Would that solve our problem perfectly? I recommend to have a look to zimsplit a tool implementing that kind of logic in the zim-tools repo (see https://github.com/openzim/zim-tools/issues/1).

mhutti1 commented 5 years ago

I don't see why this wouldn't work.

mgautierfr commented 5 years ago

Yes, I agree on the main idea. However, I would prefer to not add a extra API to maintain on kiwix-serve. There is already an API to get information about book and how to download them (opds stream or maybe metalink file) and I prefer to use (and extend) it. There is already an issue on kiwix-lib (https://github.com/kiwix/kiwix-lib/issues/209) to revamp the opds stream api, the "split offset list" could (and would be better) put in the entry info then.

kelson42 commented 5 years ago

@mgautierfr @mhutti1 @macgills Agree, what about creating a ZIM article at the URL m/chunks which would contain something like offset1,offset2,.... This would be then easily/directly deliverable via kiwix-serve?

macgills commented 5 years ago

@kelson42 with the way the new downloader works the parts are fairly irrelevant to the app as it is simply requesting the one large file. From cursory googling I read that a sd card >= 64Gb is likely to be exFat and can handle files up to 2TB so if just from an app perspective I don't think it is too much to ask a user who wants to download our files externally to purchase a compatible card. After android 2.3 the internal filesystem of android is ext4 which has a max file sixe of 1.15 exabytes so if we have the space it is definitely fine to store it there. This discussion might be pertinent for other clients but speaking for the Android app I would judge it as unnecessary at this time

mgautierfr commented 5 years ago

hat about creating a ZIM article at the URL m/chunks which would contain something like offset1,offset2,.... This would be then easily/directly deliverable via kiwix-serve?

It is not possible. the offsets depend of the content, and we need to know to know the offsets to create the article to create the zim.

kelson42 commented 5 years ago

@mgautierfr What about writting it in the ZIM header then?

kelson42 commented 5 years ago

@macgills A lot of people have FAT32 SD cards (and Android which don't support exFAT). The reasons AFAIK is that the Android integrators doe not want to pay the exFat32 license to Microsoft. We also have an audience which tend to have old/cheap Android devices.

macgills commented 5 years ago

@kelson42 yes I encountered this after writing that comment. There are actually 2 cases for the use of an sd card, to expand your device's storage or for file transfers. In scenario 1 you put the sd card in your device "permanently" and it gets formatted as system storage ie can handle >4GB. In the second scenario you are going to be using that sd card to transfer content between your devices (phone to pc, vice versa) so it gets formatted fat32 and cannot handle files >4GB. Off topic for this discussion though.

mhutti1 commented 5 years ago

Makes sense as FAT32 has support from the major operating systems.

mgautierfr commented 5 years ago

@mgautierfr What about writting it in the ZIM header then?

Same problem. Even worth as we have to change the header format.

kelson42 commented 2 years ago

With the generalisaiton of exFAT, to me the need of splitting ZIM files is really less and less needed. In addition there is a bunch of scenarios around indexes/videos where the handling is complex. I think we will never implement this.