openzim / zim-tools

Various ZIM command line tools
https://download.openzim.org/release/zim-tools/
GNU General Public License v3.0
117 stars 32 forks source link

zimsplit leaves huge 16GB last file with warning on wikipedia_en_all_nopic_2022-01.zim #295

Closed ballerburg9005 closed 2 years ago

ballerburg9005 commented 2 years ago

When I use zimsplit on this 47GB Wikipedia file, it correctly creates X GB large chunks but then always leaves a 16GB large file at the end.

opening new file /mnt/2/wikipedia_en_all_nopic_2022-01.zimaa opening new file /mnt/2/wikipedia_en_all_nopic_2022-01.zimab opening new file /mnt/2/wikipedia_en_all_nopic_2022-01.zimac opening new file /mnt/2/wikipedia_en_all_nopic_2022-01.zimad opening new file /mnt/2/wikipedia_en_all_nopic_2022-01.zimae opening new file /mnt/2/wikipedia_en_all_nopic_2022-01.zimaf opening new file /mnt/2/wikipedia_en_all_nopic_2022-01.zimag opening new file /mnt/2/wikipedia_en_all_nopic_2022-01.zimah opening new file /mnt/2/wikipedia_en_all_nopic_2022-01.zimai opening new file /mnt/2/wikipedia_en_all_nopic_2022-01.zimaj WARNING: Part /mnt/2/wikipedia_en_all_nopic_2022-01.zimaj is bigger that max part size. (16828436877>4000000000) opening new file /mnt/2/wikipedia_en_all_nopic_2022-01.zimak zimsplit --size=4000000000 /mnt/2/wikipedia_en_all_nopic_2022-01.zim 8.15s user 107.82s system 9% cpu 20:35.43 total

If I had to take a guess, it would seem like there is some kind of unknown reason for this last chunk being created at 16GB. But then again, since the point of zimsplit is primarily to make chunks in order to save on FAT32, it would totally defeat the purpose if just one file was left larger than 4GB, no matter what reasons existed.

It does not matter what options and sizes I run zimsplit with. It always creates this last file with 16GB.

Can someone explain this and how to work around it?

I am already using zimsplit as a sort of desperation move, because stock Android does only accept SD cards as FAT32 and nothing else like ext4 or extFAT or NTFS will work no matter how you partition it, what UUIDs you set and no matter what else you do. If you format them as "internal storage" from the phone, the SD card turns out encrypted and it does not work on other devices, unless you used root privileges from the phone to dump the encryption key and even then you can only mount the SD card in a sort of forensic manner from a Linux PC. Rooting requires you to wipe the phone, and with many brands of phones it involves using unofficial recovery image files from shady untrusted sources, which is also true to installing a different Android ROM (if even any exist at all), so this avenue of option is pretty much just ultra-impractical and unsafe. The whole point of an SD card is to be able to use it in new phones and other devices, or at bare minimum at least be able to use it as a backup drive. With an encrypted SD card that you cannot even decrypt on any device but the original phone, the data is very much just lost garbage if the phone fails. This basically only leaves FAT32 as an option for SD cards for most Android users (I think Samsung is one of the few only brands which implemented a custom support for exFAT and to unencrypt the SD card as internal storage, but this customization might even only be present in certain Samsung phones from certain time periods). There are absolutely no workarounds to this, I checked for hours.

I don't really understand how zimsplit works and what it does. Maybe you can just split the files with unix tools instead? I am under the impression that it does more than just that.

kelson42 commented 2 years ago

You have a big chunk because the search index is of that side and can not be splitted. There is no solution to this problem beside having an Android which supports exFat.

mgautierfr commented 2 years ago

zimsplit splits a zim file at "internal boundaries" of the zim file. But as @kelson42 said, some content (search index) are pretty big and cannot be cut. You can still cut the zim file using unix tools, but you will have some penalty performance and the search index will not be available. If you don't want the penalty performance, you can use zimsplit first and unix tools to split only the too big part (and rename correctly the zim part files). But if you cut the index, search will not be available (but you MUST keep it).