Hang when demultiplexing on many cores with many adapters

AlexanderBartholomaeus commented 2 years ago

I observe a strange behavior that I never saw before when using many adapters with many cores. I have a file with 182 adapters. When I use 40 cores (or less) cutadapt starts processing (indicated by CPU usage and the status bar appearing) after 2-3 seconds. When I use 50 cores (or more) after few seconds nothing happens anymore. No CPU usage, no status bar appearing.

When I reduce the adapter than it works with 50 cores. Before I had ~140 adapter and used 80 cores without any issue.

I haved tried with cutadapt 3.4 and latest 3.7 using python 3.8.5. The machine has 3TB of RAM and 72C/144T. The workaround is just using less cores for the moment and wait a bit longer. I really like the multicore option that now works also for many adapters for a while! Thank you!

I am using pair-end mode, the adapters file is attached. The exact call of cutadapt is: cutadapt --no-indels --no-trim --cores=50 -g file:adapt.fa -o out/trimmed-{name}_R1.fastq.gz -p out/trimmed-{name}_R2.fastq.gz input_1.fastq.gz input_2.fastq.gz

adaptersLeft.zip

marcelm commented 2 years ago

You you please add the --debug option and paste the log output here?

AlexanderBartholomaeus commented 2 years ago

Here is the output after 3mins of waiting. Thank you for taking care!

out.zip

marcelm commented 2 years ago

A couple of further thoughts.

I noticed that you added ;min_overlap=24 to all adapters in the adaptersLeft.fa file. If the minimum overlap is identical for all adapters anyway, you can just use -O 24 on the command line. (This has nothing to do with the issue.)
In case your adapter sequences occur at the 5' end of each read, you can speed up demultiplexing significantly (by at least an order of magnitude) by anchoring the adapters, see https://cutadapt.readthedocs.io/en/stable/guide.html#speeding-up-demultiplexing .

marcelm commented 2 years ago

Thanks, looking at the debug log, I see that many pigz processes are started to handle compression, which is perhaps the problem. Since you write to 182 * 2 = 364 compressed output files and one pigz process using four threads is spawned for each file, running the Cutadapt command starts up around 1500 threads. This is a bit excessive ... This has been discussed before in issue #290, see also https://github.com/marcelm/cutadapt/issues/290#issuecomment-790685640 . It seems I need to look into this again.

For the moment, can you try adding the option -Z to your command? This sets the gzip compression level for the output files to 1, so compression is much faster (but the files are a little bit larger), but the side effect is that a different program is used for compression (igzip) that does not use so many threads.

AlexanderBartholomaeus commented 2 years ago

-Z did not change anything, output attached. However, when I write to uncompressed files than everything works, even with 100 cores!

Thank you! out (2).zip !

marcelm commented 2 years ago

Alright! It’s a bit weird because both log outputs you sent me indicate that the program is actually doing something, so it should at least have some CPU usage and show a progress bar.

I’ll just have to leave this be since I don’t have a machine where I could reproduce this. Good that you have found a workaround!

marcelm commented 2 years ago

I think I was now able to reproduce this locally. Perhaps I can find a way to fix this. Thanks for reporting.

AlexanderBartholomaeus commented 2 years ago

Thank you for taking care and the fast reply! Really nice to see all the improvements!

marcelm commented 2 years ago

I’ll need to re-open this so I don’t forget.

marcelm commented 2 years ago

I was able to reproduce this using the following command:

cutadapt --debug --cores=50 -g file:adapters.fasta -o out-{name}_R1.fastq.gz -p out-{name}_R2.fastq.gz data_R?.fastq.gz

Adding some debug logging, I see that only 45 worker processes are started, but then WorkerProcess.start() for the 46th never returns. It appears that an OSError: [Errno 24] Too many open files: occurs within that method call, but it somehow is hidden and doesn’t make the program crash as one would expect.

@AlexanderBartholomaeus Can you check what ulimit -n prints on your system? That command shows the number of files that are allowed to be open simultaneously by a single process. You should be able to increase that number by issuing ulimit -S -n 10000. In my case, that avoids the hang. If you can confirm, I should be able to add a workaround to Cutadapt.

AlexanderBartholomaeus commented 2 years ago

Perfect it works! ulimit -n showed 1024. After setting to 10000 it works also with 100 cores.

marcelm commented 2 years ago

Super! I’ll try to find a fix. Because I cannot continue to work on this today, I’ll write down my findings for later.

In __main__.py, the problem is here:

    with runner as r:
        r.run()

It seems .run() at some point raises an exception because too many files are open, but for some reason no message is printed – possibly because printing a message would mean opening a file, which is not possible (even importing a module will fail if no file descriptors are available).

When the context manager exits the block, ParallelPipelineRunner.close() is called, which ends up calling PipedPigzWriter.close() (this is in xopen), which then hangs on the line retcode = self.process.wait().

marcelm / cutadapt

Hang when demultiplexing on many cores with many adapters #613