Open forestdussault opened 6 years ago
Thanks for reporting this error.
What's happening is that each aggregation job is opening up a temporary file associated with each input file (~150 + ~8000). I suspect Python is unable to open ~8150 files simultaneously and is throwing this error.
The problem is there's currently no way to change the input parameters so that this doesn't happen. The number of aggregation jobs being run can be changed, but each job is still going to try to open as many files simultaneously as there are inputs.
The short term solution would be to run Neptune with less input files. I believe the biggest we've run the software is with approximately 800 total input files. The long term solution (on my end) might involve limiting the software to perform aggregation in iterative batches with a reasonable number of files open simultaneously.
I'm not sure if this is an intended use case for Neptune, but I attempted to run the program with ~150 inclusion genomes (450 MB) and ~8000 exclusion genomes (32 GB) and it caused the program to crash before completion. Here is the log from my console: