Closed dmadisetti closed 6 months ago
Could just be memory issue from the overhead
Yes, it sounds a lot like it, especially if your desktop environment is "crashing". I quote it because it probably gets killed by the oomkiller. You could check the logs, e.g., with sudo dmesg
or cat /var/log/syslog
.
However, after decompressing and running, I had no crashes.
This also would fit a memory issue because first-time decompression (without an index), may use vastly more memory than if an index exists.
Recommended solutions to try:
ratarmount -f -P 1 Data.tar.gz
ratarmount --use-backend=indexed_gzip
I'm also working on a new rapidgzip version that tries to reduce memory usage, although rapidgzip 0.13.2, which you are using, should already have some in-memory compression...
It would be interesting to know the compression ratio of your file. 39GB, I assume, are compressed, so how large is it decompressed? Try rapidgzip --count <file>
. How large are the individual files on average / how large are the largest files?
Also, does it happen with older rapidgzip versions? Ratarmount says that it wants rapidgzip >= 0.13.0, but it should also work with older rapidgzip versions if you downgrade. Or you could try an older ratarmount version altogether.
What I don't understand is your Python code. If it really is a memory issue with first-time decompression, then it should already happen on the ratarmount
call and it should be necessary to access any files. So maybe, it is something else ...
Does the problem occur without ProcessPoolExecutor?
Definitely an OOM. Triggered the same response without ratarmount on rerun- difference being ratarmount case quit very quickly
74gb decompressed. I'm sure I can get that down way further, its just floats in ASCII. Median file looks to be about 5mb and I have about 10k files (I just ran du on the uncompressed data, rapidgzip took a long time)
Works fine single threaded- but sounds like more of a user error. Thanks!
Works fine single threaded- but sounds like more of a user error. Thanks!
User error in so far as that the memory usage did come from another program? If the default usage leads to out of memory on an 80 GB memory system, then I wouldn't categorize it as a user error even if -P 1 helps ...
But, I don't understand where the memory is going. It would have to buffer the whole 74 GB of decompressed file into memory to fill up that system, which definitely shouldn't happen. Maybe there is a memory leak somewehere. Could you analyze the memory usage, e.g., with /usr/bin/time -v ratarmount -f ...
, which has a "Peak RSS" line?
Loading gzip block offsets took 1.32s
Command being timed: "ratarmount -f Data.tar.gz"
User time (seconds): 27.73
System time (seconds): 5.55
Percent of CPU this job got: 4%
Elapsed (wall clock) time (h:mm:ss or m:ss): 13:47.47
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1462476
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 350
Minor (reclaiming a frame) page faults: 810926
Voluntary context switches: 126679
Involuntary context switches: 24440
Swaps: 0
File system inputs: 23203750
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Hm 1.4 GB memory usage. Not good, but also not bad. Seems like it works fine. Based on the "User time"; I am assuming that this is with the Python script reading from the mount point? It would be interesting to see the output in case it gets killed because of insufficient, although I'm not sure whether the output will be shown in that case.
If you're content, I'm going to close this out- I think the overhead was enough to make my memory greedy code more noticeable :)
Thanks!
I use this all the time, I love it, thanks. This is the first time I have has issues.
ratarmount -v
on NixOS
My data is about 39GB and I mount with
ratarmount -f Data.tar.gz
. My stripped-down code looks something like:Level of crash varied from program stop, to everything and DE crashing. However, after decompressing and running, I had no crashes. Could just be memory issue from the overhead? I have about 80gb, but this process caps out at about 50
I don't necessarily know if this is reproducible; I just wanted to report it. Feel free to take note and close this issue out.