openscopeproject / ZipROFS

FUSE file system with transparent access to zip files as if they were folders.
MIT License
12 stars 5 forks source link

python3 high cpu usage #10

Closed EarBiteR closed 11 months ago

EarBiteR commented 11 months ago

I get a python3 process pegged at 100% when trying to use a file in a zip. I am running Linux Mint 21.2 x86_64 - Kernel: 5.15.0-86-generic

I also tried ZipROFS from Dec 2022 and same failure.

File access and copying (from inside the zip) is normal using nemo file manager.
Cant find anything logged and not sure how to debug python3. Any suggestions?

qu1ck commented 11 months ago

Run ziprofs from terminal (not as a mount) and pass -o foreground,debug options. It will print out system calls, I suggest you test it on small files to not clog your output.

Then play around with the file system to try to find what causes high cpu usage, hopefully it is caused by something repeatedly calling the file system.

qu1ck commented 11 months ago

Few more things to try to narrow down possible causes. Access ziprofs mounted files from terminal to do specific commands. For example stat will only get attributes, dd will only open and read.

Then check if high cpu usage starts only after specific command, and if it stops after command finishes.

EarBiteR commented 11 months ago

Thanks. I tried that (on 14.9m) and it comes up with pages of: DEBUG:fuse.log-mixin:<- read b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00>\x00\x04\x00\xfe\xff\x0c\x00\x06\x00\x00\x00\x00\x00\x00\x00\x1b\x00\x00\x00\x04\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x1e\x00\x00\x00\x10\x00\x00\x00\xfe\xff\xff\xff\x00\x00\x00\x00\x03\x00\x00\x00\xc8\x01\x00\x00\xc1\x06\x00\x00\xf0\x0b\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff

and breaks between show:

DEBUG:fuse.log-mixin:-> read /media/ron/506dd748-87e1-405a-93eb-ec849705d09e/Pinball/tables/VP.2019/VPX/Visual Pinball [VPXx] PinMame Tables/250cc (Inder)(1982)(JPSalas)(1.0.0)[VPX05][DT+FS+dB2S+DOF].zip/250cc (Inder - 1992).vpx (8192, 13611008, 9) DEBUG:fuse.log-mixin:<- read b'\x01\x18\x00\x00\x02\x18\x00\x00\x03\x18\x00\x00\x04\x18\x00\x00\x05\x18\x00\x00\x06\x18\x00\x00\x07\x18\x00\x00\x08\x18\x00\x00\t\x18\x00\x00\n\x18\x00\x00\x0b\x18\x00\x00\x0c\x18\x00\x00\r\x18\x00\x00\xfe

I tried it on a smaller file (9.6m) and it shows:

DEBUG:fuse.log-mixin:-> read /media/ron/506dd748-87e1-405a-93eb-ec849705d09e/Pinball/tables/VP.2019/VPX/Visual Pinball [VPXx] Recreated Tables/1-2-3 (Talleres de Llobregat)(1973)(JPSalas)(1.0.1)[VPX05][DT+FS][dB2S Loserman76].zip/1.0.1/_123(Talleres de Llobregat 1973).vpx (8192, 8392704, 11) DEBUG:fuse.log-mixin:<- read b'\xfd\xff\xff\xff\x02\x08\x00\x00\x03\x08\x

and just sits there proceeding through the file. I dont see anything logged giving me any clue as to what other processes may be the cause. Only a 'python3' pegged at 100% cpu which I assume is the Ziprofs mount. Supposedly 3.11 is a lot faster than the 3.10.12. I loaded 3.11 up and it does the same.

I am using shell with scripts to check and mount and then Zenity for a gui file selector. I tried it without zenity and same. Just the python3 pegged and the gnome-terminal due to debug mode.

qu1ck commented 11 months ago

I'm confused, is your issue fixed or did you close this by accident?

First confirm that the python process with high cpu usage is ziprofs, you can see full command of the process in htop program.

Second, did you find any correlation for when is the cpu usage high? Is it all the time? Only after specific commands? Only during specific commands?

EarBiteR commented 11 months ago

Thanks. I tried that (on 14.9m) and it comes up with pages of: DEBUG:fuse.log-mixin:<- read b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00>\x00\x04\x00\xfe\xff\x0c\x00\x06\x00\x00\x00\x00\x00\x00\x00\x1b\x00\x00\x00\x04\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x1e\x00\x00\x00\x10\x00\x00\x00\xfe\xff\xff\xff\x00\x00\x00\x00\x03\x00\x00\x00\xc8\x01\x00\x00\xc1\x06\x00\x00\xf0\x0b\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff

and breaks between show:

DEBUG:fuse.log-mixin:-> read /media/ron/506dd748-87e1-405a-93eb-ec849705d09e/Pinball/tables/VP.2019/VPX/Visual Pinball [VPXx] PinMame Tables/250cc (Inder)(1982)(JPSalas)(1.0.0)[VPX05][DT+FS+dB2S+DOF].zip/250cc (Inder - 1992).vpx (8192, 13611008, 9) DEBUG:fuse.log-mixin:<- read b'\x01\x18\x00\x00\x02\x18\x00\x00\x03\x18\x00\x00\x04\x18\x00\x00\x05\x18\x00\x00\x06\x18\x00\x00\x07\x18\x00\x00\x08\x18\x00\x00\t\x18\x00\x00\n\x18\x00\x00\x0b\x18\x00\x00\x0c\x18\x00\x00\r\x18\x00\x00\xfe

I tried it on a smaller file (9.6m) and it shows:

DEBUG:fuse.log-mixin:-> read /media/ron/506dd748-87e1-405a-93eb-ec849705d09e/Pinball/tables/VP.2019/VPX/Visual Pinball [VPXx] Recreated Tables/1-2-3 (Talleres de Llobregat)(1973)(JPSalas)(1.0.1)[VPX05][DT+FS][dB2S Loserman76].zip/1.0.1/_123(Talleres de Llobregat 1973).vpx (8192, 8392704, 11) DEBUG:fuse.log-mixin:<- read b'\xfd\xff\xff\xff\x02\x08\x00\x00\x03\x08\x

and just sits there proceeding through the file. I dont see anything logged giving me any clue as to what other processes may be the cause. Only a 'python3' pegged at 100% cpu which I assume is the Ziprofs mount. Supposedly 3.11 is a lot faster than the 3.10.12. I loaded 3.11 up and it does the same.

I am using shell with scripts to check and mount and then Zenity for a gui file selector. I tried it without zenity and same. Just the python3 pegged and the gnome-terminal due to debug mode.

EarBiteR commented 11 months ago

SORRY, I did not mean to close as I have not resolved this yet. Top shows a python3 process but I dont see ziprofs. I will keep trying today to see if I can find something. Thanks for your help!

EarBiteR commented 11 months ago

Screenshot from 2023-10-22 08-47-05 Screenshot from 2023-10-22 08-50-47

Top only shows the single ziprofs process but with htop I see all of them now. Running foreground,debug I see up to 11 with the 1st process pegged and the rest all using maybe 100%. So its not taking advantage of all 16 cores that I have. With just foreground its similar with 1 process pegged and 3-4 more using up to 100% total, not each. Hope these pics help with what I just described. Is it a python tuning thing? I searched and found things about processor pools:

from multiprocessing import Pool, cpu_count (from https://stackoverflow.com/questions/61465029/multiprocessing-not-using-all-cores)

so wonder if the script just needs to be setup for that. Should I bump the cache values as I have 32g ram? I havent coded python so dont know much about it.

This one looks like it would take a bit of coding - https://stackoverflow.com/questions/50605186/threading-not-using-all-cores-in-python

qu1ck commented 11 months ago

Are your zip files compressed? If you do random access reads on compressed zip files you should expect bad performance and high cpu usage because for every file seek you have to read full zip file up to that seek position, that's how zip files work.

If your files are uncompressed than you should get much better performance with python 3.12 because it optimized reads of such archives.

In general you will not be able to spread the load across CPUs, not all tasks are parallelizable, file systems in general and (compressed) zip reads are sequential.

If you have RAM to spare and need high performance random access reads from compressed zip archives then copy the files from ziprofs mount to a ramdisk and use them from there.

EarBiteR commented 11 months ago

Yes, they are compressed pinball tables. It should just be loading the single table file. I appreciate you listening to me rant. I will write a script that just unzips the file first, let me choose the table to launch(play) then delete the temporary folder. Ziprofs used to work and VERY well with VP using WINE so I thought I would ask. I first started out with AVFS (https://avf.sourceforge.net/) but you have to add a # at the end of the file to make it show the zip contents where as ziprofs is MUCH friendlier! That worked fine when access from the table editor under wine. Now there is a native linux version of Visual Pinball (VPinballX_GL) but using zenity I cant do the # trick for AVFS. From Nemo filemanager I can copy a file from a ziprofs mount and it takes 1 second and is done. Reading the same file from visual pinmame takes many minutes or I give up. So ziprofs works great except with VP. If I figure it out I will update this..

qu1ck commented 11 months ago

It sounds like VPinballX_GL read pattern is random but it reads a single file. It's possible to optimize ZipROFS for such use case by adding single file cache for last read file, making essentially a built in single file ramdisk.