kiwix / overview

:balloon: Start here for current projects, how to get involved with offline projects and joining community calls. A resource for new and veteran members
87 stars 14 forks source link

zim files cannot be split properly, split files almost never work - mobile Kiwix dead - no Wikipedia #65

Closed ballerburg9005 closed 2 years ago

ballerburg9005 commented 2 years ago

From what I have gathered, the support of split zim files has been dropped entirely, because it didn't really work to begin with.

Here is what I know to be true about that:

  1. Any and all regular Android phones do not support any other file system than FAT32 on SD cards and thus require ZIM files to be split into chunks that are smaller than 4GB in order to work at all. It is entirely untrue that extFAT, NTFS or even ext4 are available options in stock Android, both to recent and also obsolete Android versions. Only custom ROMs may support different file systems, if that has been specifically added as a feature different from the default Android behavior, which is to reject any file system other than FAT32 (even though the kernel obviously has support). Those custom ROMS can be found e.g. on xda-developers.com, they void your warranty and they require you to root your phone, which is often not possible without resorting to shady untrusted recovery image blobs for less popular phone brands. Also rooting your phone nowadays requires wiping the entire phone first. Very very very few if any but one or two manufacturers use custom preinstalled ROMs, such as Samsung, modified to have NTFS and extFAT enabled for SD cards. This is highly unusual, even within the Samsung brand many Samsung phone models may not even have this modification and Samsung might drop it any time, e.g. when they switch to a newer codebase. Also highly unusual is how Samsung's custom ROMs handle encryption on SD cards. In regular unrooted Android, and I believe in all versions of it as well, if the phone has been once encrypted (which is the default for most phones since years and years and it can't be reversed), the SD card is automatically also encrpyted if not formatted as FAT32, but as "internal storage" (i.e. ext4 inside a dmcrypt) instead. In unrooted stock Android, there is no option or trick to decrypt a once encrpyted SD card (but on certain newer Samsung phones there is). Thus if formatted in this manner, the card can never ever be used on any external device no matter what operating system or tools you use, the data can never be backed up in a safe and direct way, and basically your SD card's data is lost if the phone dies or any other hardware problems arise. To most users who store much much more than just Wikipedia on their SD card, this is just highly inacceptable and why the card still needs to be FAT32 in all cases up to this day with all phones that don't run modified ROMs.
  2. Almost all ZIM files (e.g. Wikipedia) cannot and could never be split into chunks smaller than 4GB. This is because they all contain certain internal parts, most notably the "search index", which is always destined to easily exceed the 4GB limit in direct proportion to the size of the ZIM file, even in ZIM files as small as 12GB (e.g. Wikipedia mini). I was told that as an ugly hack, you can still use unixtools to split those unsplitable parts. This would be necessary for the files to still work at all, but it renders those parts useless. So without the search index you can't search anything in the file anymore, which makes this hack highly unviable. Also by reading other issues I got the strong impression, that split file support has been dropped and left in the codebase only as an unmaintained remnant. I was not even able to open Wikipedia mini with Kiwix-desktop, after I split the 12GB ZIM file properly and regularly with zim-tools, and I left the 5GB part at the end (search index) as it was. As I understand it, Kiwix has long been in a state where it is not expected to be able to reliably deal with any sort of splitting of files.
  3. This leaves only one conclusion: Generally speaking large ZIM files do not work on Android, Kiwix does not work without ZIM files. Mobile Kiwix is dead for Wikipedia and Project Gutenberg.

This is of course not denying that, as mentioned, people with custom ROMs and/or rooted phones can use NTFS on their SD cards and get Kiwix to work. You have probably heard people report about being successful with that. But this is simply not a standard option. If you got the impression that lack of extFAT or NTFS support is maybe a fluke that might occur only to some Android brands, or only with older Android versions, this is totally not the case. The opposite is true: It is what happens on all Android phones with all Android versions, unless the phone ROM has been specifically modded to eliminate this problem.

People with regular phones and regular Android can't use Kiwix to read Wikipedia or any other larger ZIM files from SD cards, because the cards cannot be directly formatted with anything but FAT32, and there are no workarounds to that.

I just wanted to write this issue to clarify on the facts of this situation, and to ask if the team would be willing to address this issue or what it possibly took for outside devs to repair this problem. I would be very happy to see Wikipedia to make a return to mobile Kiwix, since that is basically just by far most the #1 use case for it. Damn, it was even developed just for this purpose. There must be something we can do to bring it back.

kelson42 commented 2 years ago

Most of recent Android devices support exFat (ref https://www.compuhoy.com/can-android-read-exfat-file-system/). Considering that this has been now part of the Linux kernel since 5.7 (ref https://kernelnewbies.org/Linux_5.7#New_exFAT_file_system), there is no legal reason anymore not to support it for Android Phone carriers. We bet that all the (new) devices will soon support it. I would be really suprised for example if a new Samsung device does not support exFat.

I have no clue about the encryption topic within Android. I'm interested in any formal documentation explaining how an external SD card is automatically encrypted (so without user approval).

You seem to understand the splitting feature at ZIM level properly, but this is not broken or left unmaintained. You can still split files, but not randomly somewhere in a file. This is for the reasons and the consequences you have explained. But this is not true that you can not search at all without the fulltext indexes ; basic suggestions feature (based on title) should work.

If you can not open a valid ZIM file with Kiwix-Desktop, please open a ticket there with all the details.

We have never tested NTFS and this is not officialy supported, even if I could figure out it could work.

The solution of this problem is pretty easy to me:

ballerburg9005 commented 2 years ago

Most of recent Android devices support exFat (ref https://www.compuhoy.com/can-android-read-exfat-file-system/). Considering that this has been now part of the Linux kernel since 5.7 (ref https://kernelnewbies.org/Linux_5.7#New_exFAT_file_system), there is no legal reason anymore not to support it for Android Phone carriers. We bet that all the (new) devices will soon support it. I would be really suprised for example if a new Samsung device does not support exFat.

What you said is only partially true and it doesn't change what I described. Android has native support for all sorts of suitable filesystems and can mount them from anywhere if you root your phone. Given that you do not root your phone, it just does not matter what the Linux kernel can do and you are limited to what the Android system (for whatever mysterious nonsensical reasons) accepts on SD cards: FAT32. Like I said Samsung supports NTFS and exFAT in a lot of new-ish phones, because the manufacturer uses proprietary custom modifications of the Android system, that is: customized ROMs.

  • Use a device which support exFat
  • Use a device with an internal storage big enough for the big ZIM files

This is the same as saying that those are the only solutions:

  1. buy a Samsung phone.
  2. buy 10 phones at random by chance to hit one with modded ROM
  3. wipe your entire phone, likely requiring you to resort to recovery image blob with spyware, use untrusted ROM, and/or then mod Android yourself with root privileges
  4. use SD card as forcefully encrypted "internal storage", which renders its content into garbage to any other device, prevents safe and full backups of the SD card contents and leaves you helpless to the buggy MTP protocol
  5. Buy a new $800 phone to fit a 87GB Wikipedia and 60GB Gutenberg file onto the internal 256GB flash memory (?)

This is not very acceptable in my mind and why I described the situation in such detail.

In truth the solutions are not viable if naturally concluded. Users are confronted with very hard limits and impossibilities. The only real answer to the problem is to abandon the idea of using Kiwix.

I don't think there is much of any kind of "formal documentation" if it comes to the absence of features or drawbacks introduced by new ones. When you google the topic on the internet, it is also confusing because most authors don't know exactly themselves what is going on and why, especially considering that what users report to be true is totally confounded by rooting, non-SD-card storage, and all those different Android versions and ROM variants out there.

I think the first thing you have to realize here is that Linux has always supported all sorts of file systems that are fit for the purpose, e.g. ext2, which Android always uses internally for the system partition. But you have never been able to make use of them, due to an extra additional limitation by the Android system that applies specifically to SD cards. So it is easy to understand, that this is not a matter of just getting a more recent kernel and more recent phone. The problem has never been that Android has not been "up to date enough" to be able to deal with file systems that are not straight from the 80s. It always internally could.

The second thing you have to realize is, that it makes no sense that some phones were always able to support ext2 or NTFS and others were not - unless the reason was that they were using Android systems that differed to each other. Then if this was just a matter of newer Android versions having a newer features, again it makes no sense that up to this day brand new phones still suffer from the lack of ext2, exFAT or NTFS support on SD cards.

Discouting far off fringe explanations, this situation only really makes sense, when you realize that only phones with custom ROMs have support for filesystems other than FAT32 on SD cards, i.e. that stock Android never had and still doesn't have support because it puts in this stupid extra limitation to handle SD cards with.

The third thing you may come to realize is, that this limitation probably is no accident and that it has been deliberately designed and kept in place all this time for moronic reasons by people you probably cannot educate to the better. This is most likely why manufacturers and hobbyists have been doing ROM mods for the last 15 years over and over again to add this feature, rather than simply pushing it into Android upstream directly.

If you really needed proof so badly, then tell me I could fire up an emulator with the latest Android version and then send you screenshots of how it rejects all non-FAT32 filesystems. But if you really wanted me to do so, then please respect that there is only one sensible conclusion to draw from this test result: if only this one then all Android versions are affected.

I really think this calls out for a solution from Kiwix, that works for people other than basically just Samsung phone owners, phones with more than 12GB RAM and other oddball cases. Maybe me or people from the community can do some work on it. But I think there needs to be an understanding first about the nature of the issue.

kelson42 commented 2 years ago

Situation of exfat support in the Linux kernel has significantly changed after https://cloudblogs.microsoft.com/opensource/2019/08/28/exfat-linux-kernel/.

Like I have written this is a bet that native Android will support exfat in the future.

For the moment and afaik, most of carriers recompile and custom anyway their Android Kernel and add this feature. Happy about your references if this is untrue. You just need to check before buying a new device.

I don't tell that the current situation is perfect, but these are the explanations and chosen strategy on our end.

ballerburg9005 commented 2 years ago

I am sorry for writing so much, if it seems too much to read.

As mentioned, the problem is not that the Linux kernel is or has been unable to support modern suitable file systems. Since Android version 1 (almost 15 years) it has supported at minimum Ext2, iirc even unexperimental NTFS from version 4 on (9 years ago). We are now at Version 12.

The problem is that Android puts in an artificial limitation - totally unrelated to Linux - that applies specifically to the mounting mechanism of SD cards, which dictates that only FAT32 is valid.

So it absolutely does not matter whether or not Linux now switches to an improved implementation of exFAT or BTRFS or ReiserFS or whatever it may be. It is just one of the filesystems from this very long list of filesystems with >4GB file support, that Linux already suported all this time. ExFAT is an oddball fringe filesystem, unlike NTFS which was supported all this time. Now they implemented it more nicely into Linux, but what does this change about the issue? Absolutely nothing.

We have been in this exact situation for 15 years. I think it is quite unreasonable to assume that this will change in another 5 years. Nothing whatsoever indicates that it will. To the contrary, it makes no sense whatsoever that this limitation has existed and persisted so long all this time, despite modders and manufacturers being quite keen to remove it in mainline Android for over 15 years. It is probably there because someone wants it to be exactly this way. If we just wait yet another year and yet another year for it to finally happen, like we did the last 15 years, that is quite illusory.

TL;DR Android/Linux had exFAT support for a very long time, and has had suport for FS with >4GB files since it existed (~15 years). However by system design those FS are forbidden to be used on SD cards. Nothing we know of indicates this problem will disappear, everything indicates that it persist on purpose. Normal Android users cannot use Kiwix.

starbrights commented 2 years ago

Download to internal sdcard (so phones internal data) does work, but gets interrupted - even if battery optimization is off. Also I can't copy that zim file downloaded via computer to phone - this part isn't accessible. I tried that with root permission, but for some reason that zim file will not be recognised.

So I agree - not really usable on Android.

kelson42 commented 2 years ago

@starbrights If your download fauls, this is a 100% different problem I can only recommend to open a ticket with all the details.

ballerburg9005 commented 2 years ago

@starbrights: I agree that given that your phone has this huge 256GB internal storage (which few phones have), you should just be able to save the ZIM file there. Your problem should not really be reproducible.

However if you don't ... to continue my odyssey to get Kiwix running without spending $250 on a new phone, I actually went through the process of rooting my phone in order to enable exFAT support. Like I said this is a hardly viable path to take, because it wipes your phone and you have to write crappy hacks in order to bypass the normal SD card mounting mechanism. For one, it took me 10 days and thousands of lines of email and support chat messages, spread over 30 separate encounters where I (amongst many other things) physically had to drive to an ATM machine and try to pay out bogus amounts with my card while chatting with support to reactivate my bank account (it was frozen the entire time so transactions would fail and such). I had to attempt video ident over 20 times because the first phone I used was too old to produce fluid video, then they would complain about echos (at minimum volume, I had to lend an USB-C headphone adapter...), bright lights, bird noises and all sorts of shit to cancel the whole process. The front camera of my main phone is broken so I had to lend another phone (the third phone...) from my mother etc etc. I had to contact them over 6 times because of the "pairing" of the wrong phone, which I couldn't undo because I was stuck in video ident, which I couldn't progress for named reasons. Before that I had to contact them another half a dozen times to change the defunct phone number associated with my account, digging out old phone bills for them etc although I had properly taken precaution to unpair and pair the new temporary phone with working front camera as demanded by the app before doing the wipe. This was a HUGE mess and charades, and all just because I wiped my phone. I had made proper backups of all the apps and all that, did everything you can possibly foresee, but they would still turn out broken. The reconfiguring, reinstalling, gathering lost contacts and things I had forgotten took days and days and days. SMS are gone I had forgotten about that, plus half of my recent contacts are gone for no reason (stopped being backed up by Google 2 years ago for no reason). This sort of shit ALWAYS happens if you wipe your phone, which is why it is a major disaster to root your phone (it requires wiping).

But wait ... this was not all of course. I then tried to reprogram Vold (the mechanism responsible for allowing only FAT32 on sd cards). You can read about my failure here. Long story short, even with root permissions it is impossible to change this mechanism, because it is heavily modified by vendor code and recompiling stock Android code will destroy your system partition each time.

So I tried all the other options (with considerable drawbacks) out there from xda-developers, but NONE WORKED. So I had to write my own hack script to mount exFAT. Btw. I am using several mystery binary blobs from XDA that could contain trojans at this point of course, plus recovery image blob from the marginally reputable XDA forum nick "msdev" (apparently some guy from Russia who has not posted anything the last 3 years). Writing your own script and compiling your own shortcut app is not trivial either. You can see the result for yourself

But wait ... it doesn't end here. As it turns out, exFAT is actually not viable to use on sdcards. After plugging it in and out just a couple of times, without any prior writes but without clean umounting, the filesystem turned out broken and could neither be repaired by Linux nor Windows fsck. I also didn't find any forensic tools for exFAT, like they exist for other filesystems, to restore my data. This means that all data was lost! So I had to format it NTFS instead and use a backup to restore my sdcard content. With my script you could also use ext4, which would be even safer to use but I wanted my sdcard to be readable on other devices as well. With NTFS you can still plug it into other phones with a card reader and OTG, or directly to Windows PCs. But of course only with exFAT there is a slim chance that the card will work in the sdcard slot of future phones.

DO NOT USE exFAT. You will probably lose all your data.

Also I want to correct a statement I made. When I reprogrammed Vold, I had plenty of time to look at the source code of it.

Like you can look up yourself there in Git version history, the fact of the matter is as such:

This pretty much leaves the same dilemma that I already described: mobile Kiwix is totally broken for Wikipedia and other large ZIM files, without any viable alternatives. The only working solution is pretty much to buy a new phone with giant internal storage, or maybe a Samsung phone (custom exFAT implementation and custom ROM - but really you have to ask, is exFAT safe enough yet to use?). This is not something you can call "viable" though. If you don't just accidentally happen to already have it, it is insane to spend $250 on a new phone just to have some app run. An app that in a normal world "should" be able to run easily on phones that are even 10 years older than yours (and actually in Android 4 before all this Vold crap and root requiring a wipe they did!).

Also it is questionable, if average budget phones will ever hit the 256GB mark in some distant point in the future, which is required to save bigger future versions of Wikipedia while still maintaining enough internal storage space. Traditionally, storage has doubled ever few years. But marginal utility might be reached at a mere 64GB, which budget phones will aim at. Marginal utility is also the reason why most people like myself will not even buy new phones. They don't run out of internal storage space anymore like they used to and they don't get slower and slower each year like it used to be. A decent phone from 5 years ago is still perfectly fine to use and probably will be fine to use another 5 years. My phone has only 32GB and 1/3rd of it is still free, with basically "infinite" memory available to upgrade via SD card.

Based on this, I would say it is highly realistic to project that it will take at least 10 years before >80% of users will own a phone that will be able to sanely run Kiwix with Wikipedia (EN, full & up to date at that time). But only given that the new exFAT implementation is totally unlike the old one, where just plugging the card in and out 3 times totally destroys the entire filesystem. Even a filesystem just remotely like that is not "sanely" usable at all. Suppose exFAT stays like this, maybe we are even looking at 15 or 20 years instead, with only expensive high-end phones having enough internal storage in the meantime.

This is why I strongly advocate for Kiwix to solve the problem for the foreseeable future with split ZIM files.

starbrights commented 2 years ago

I am using LOS 19 and I think that supports exFAT on SD-card. But (maybe I am blind) I was not able to set the folder to external SD-card. Anyway I would prefer to download on PC and shift it to phone later. I guess downloading on phone (as on PC) takes hours and battery optimisation/screen of/ ... might interrupt download.

I am back too Aard2. German wiki is just 6GB. Is it right, that kiwix counterpart of that is around 14GB? On download page in app there are no details of what is included and what not. The advantage of kiwix is that images are included - so 38 GB if I remember right.

kelson42 commented 2 years ago

Closing this tickez as there is norhing more we can do here.