Multi-threaded Crash - Githubissues

dmadisetti commented 6 months ago

I use this all the time, I love it, thanks. This is the first time I have has issues.

ratarmount -v

ratarmount 0.15.0
ratarmountcore 0.6.3

System Software:

Python 3.11.9
FUSE 2.9.9
libsqlite3 3.45.2

Compression Backends:

indexed_bzip2 1.6.0
indexed_gzip 1.8.7
indexed_zstd 1.1.3
rapidgzip 0.13.2
rarfile 4.2
xz 0.5.0

Versioned Loaded Shared Libraries:

ld-linux-x86-64.so.2
libc.so.6
libdl.so.2
libm.so.6
libpthread.so.0
libcrypto.so.3
libssl.so.3
libgcc_s.so.1
libstdc++.so.6.0.32
libffi.so.8.1.4
libpython3.11.so.1.0
libbz2.so.1.0.8
liblzma.so.5.4.6
libfuse.so.2.9.9
libz.so.1.3.1
libzstd.so.1.5.6
libsqlite3.so.0.8.6

on NixOS

My data is about 39GB and I mount with ratarmount -f Data.tar.gz. My stripped-down code looks something like:

def fn(metadata_path):
    with open(metadata_path) as file:                                                                           
        instance = json.load(file) 
    # Process instance
    return result

with concurrent.futures.ProcessPoolExecutor() as executor:                                                  
    results = executor.map(fn, files)

Level of crash varied from program stop, to everything and DE crashing. However, after decompressing and running, I had no crashes. Could just be memory issue from the overhead? I have about 80gb, but this process caps out at about 50

I don't necessarily know if this is reproducible; I just wanted to report it. Feel free to take note and close this issue out.

mxmlnkn commented 6 months ago

Could just be memory issue from the overhead

Yes, it sounds a lot like it, especially if your desktop environment is "crashing". I quote it because it probably gets killed by the oomkiller. You could check the logs, e.g., with sudo dmesg or cat /var/log/syslog.

However, after decompressing and running, I had no crashes.

This also would fit a memory issue because first-time decompression (without an index), may use vastly more memory than if an index exists.

Recommended solutions to try:

Use bgzip for compression
Use fewer processes for (first-time) decompression ratarmount -f -P 1 Data.tar.gz
Try ratarmount --use-backend=indexed_gzip

I'm also working on a new rapidgzip version that tries to reduce memory usage, although rapidgzip 0.13.2, which you are using, should already have some in-memory compression...

It would be interesting to know the compression ratio of your file. 39GB, I assume, are compressed, so how large is it decompressed? Try rapidgzip --count <file>. How large are the individual files on average / how large are the largest files?

Also, does it happen with older rapidgzip versions? Ratarmount says that it wants rapidgzip >= 0.13.0, but it should also work with older rapidgzip versions if you downgrade. Or you could try an older ratarmount version altogether.

What I don't understand is your Python code. If it really is a memory issue with first-time decompression, then it should already happen on the ratarmount call and it should be necessary to access any files. So maybe, it is something else ...

Does the problem occur without ProcessPoolExecutor?

dmadisetti commented 6 months ago

Definitely an OOM. Triggered the same response without ratarmount on rerun- difference being ratarmount case quit very quickly

74gb decompressed. I'm sure I can get that down way further, its just floats in ASCII. Median file looks to be about 5mb and I have about 10k files (I just ran du on the uncompressed data, rapidgzip took a long time)

dmadisetti commented 6 months ago

Works fine single threaded- but sounds like more of a user error. Thanks!

mxmlnkn commented 6 months ago

Works fine single threaded- but sounds like more of a user error. Thanks!

User error in so far as that the memory usage did come from another program? If the default usage leads to out of memory on an 80 GB memory system, then I wouldn't categorize it as a user error even if -P 1 helps ...

But, I don't understand where the memory is going. It would have to buffer the whole 74 GB of decompressed file into memory to fill up that system, which definitely shouldn't happen. Maybe there is a memory leak somewehere. Could you analyze the memory usage, e.g., with /usr/bin/time -v ratarmount -f ..., which has a "Peak RSS" line?

dmadisetti commented 6 months ago

Loading gzip block offsets took 1.32s
    Command being timed: "ratarmount -f Data.tar.gz"
    User time (seconds): 27.73
    System time (seconds): 5.55
    Percent of CPU this job got: 4%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 13:47.47
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 1462476
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 350
    Minor (reclaiming a frame) page faults: 810926
    Voluntary context switches: 126679
    Involuntary context switches: 24440
    Swaps: 0
    File system inputs: 23203750
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

mxmlnkn commented 6 months ago

Hm 1.4 GB memory usage. Not good, but also not bad. Seems like it works fine. Based on the "User time"; I am assuming that this is with the Python script reading from the mount point? It would be interesting to see the output in case it gets killed because of insufficient, although I'm not sure whether the output will be shown in that case.

dmadisetti commented 6 months ago

If you're content, I'm going to close this out- I think the overhead was enough to make my memory greedy code more noticeable :)

Thanks!

mxmlnkn / ratarmount

Multi-threaded Crash #136