Extraction of tar.gz file takes too long

zhanghai / MaterialFiles

Material Design file manager for Android

https://play.google.com/store/apps/details?id=me.zhanghai.android.files

GNU General Public License v3.0

6.07k stars 410 forks source link

Extraction of tar.gz file takes too long #1237

Open shenghuang147 opened 5 months ago

shenghuang147 commented 5 months ago

Device Model: XiaoMi 10 Ultra Android Version: 10 QKQ1.200419.002 MIUI Version: xiaomi.eu 12.0.10 MaterialFiles Version: 1.7.2 (37) Source: F-Droid

I compressed all font files from the Fonts file of Windows 11 21H2 into Fonts.tar.gz , which contains a total of 336 font files

I found that MaterialFiles takes 5 minutes to fully extract this file, while using tar -zxf Fonts.tar.gz in termux only takes 3.3 seconds

.../Fonts $ time tar -zxf ../Download/Fonts.tar.gz

real    0m3.300s
user    0m2.785s
sys     0m0.482s

start: 00:55:40

end: 01:00:37

zhanghai commented 5 months ago

How large is the tar.gz file itself? My suspicion is that the decompression of every single file requires reading the archive from the beginning again. Not sure if this can be optimized easily.

shenghuang147 commented 5 months ago

How large is the tar.gz file itself? My suspicion is that the decompression of every single file requires reading the archive from the beginning again. Not sure if this can be optimized easily.

only 191MB

zhanghai commented 5 months ago

Then it may just be that reason, decompressing roughly 300 * 100 MB = 30 GB of data

Feuerswut commented 4 months ago

Yeah, have this issue as well, depends on library implementation. maybe the extract all specifically should be optimised.

masterflitzer commented 6 days ago

My suspicion is that the decompression of every single file requires reading the archive from the beginning again

this is not how it's supposed to work and tar on the cli doesn't work like that either, a .tar.gz is archived and then compressed (not the other way around), so it needs to be decompressed only once at the beginning giving you a .tar (temporarily) and then the files should get extracted from the tar, after all files are extracted the temporary tar is deleted, so there should be no decompressing of all the individual files, only tge archive itself

zhanghai commented 6 days ago

this is not how it's supposed to work and tar on the cli doesn't work like that either, a .tar.gz is archived and then compressed (not the other way around), so it needs to be decompressed only once at the beginning giving you a .tar (temporarily) and then the files should get extracted from the tar, after all files are extracted the temporary tar is deleted, so there should be no decompressing of all the individual files, only tge archive itself

That doesn't really matter. As long as the archive format doesn't support random access (TAR doesn't have an index), if you try to read one file independent of the other files you'll always have to start from the beginning. More complex logic could be implemented so that reading of different files from the same TAR archive can be coordinated in some way so that only one pass is needed, but I don't have the bandwidth for that right now.

masterflitzer commented 5 days ago

but this is about extract the whole archive, not individual files, so the equivalent of tar -axvf archive.tar

zhanghai commented 5 days ago

Like I said earlier, extracting all files in an archive isn't different from extracting individual files as a file operation right now.

masterflitzer commented 5 days ago

yes i know, all i'm saying is that the differentiation between all and individual filea needs to be implemented and then the issue is fixed