Open muppetjones opened 7 years ago
I've run this a few times, and the only way to guarantee no threads exit seems to be with MAX_JOB=8
_when running with an input VCF. I'm 90% certain it's a memory issue as the threads start to drop when the memory maxes out, but for some reason, each job quits quietly (vs. running w/o a VCF and too many jobs, which quits with MemoryError
).
However, splitting the job into more parts and only running 8 at a time still works, but you only get the regions for those 8 jobs. I'm assuming the region split is constant (otherwise it wouldn't work in the first place), so you could potentially split as much as you like and just run 8 at a time; unfortunately, this negates any benefits from running regions in parallel.
I don't know if this was an issue before, and I just missed it, or if this is something new.
I had the same problem, when trying to call gen-reads.py in parallel. It looks like everything is working and then the jobs abort themselves without a message part way through. It is easy to miss becasue you think everything ran to completeion without error.
COMMAND:
find L*.vcf | sed 's/.vcf$//' | parallel 'python3.6 neat-genreads/gen_reads.py -M 0 -R 150 -o {} -v {}.vcf -c 1 --vcf -r ref.fa'
For some reason, when I generate reads using a VCF, most of the threads exit without a sound a few minutes after starting the thread. This seems to happen before the reads are assigned parts of the genome. It is not always the same thread, nor is it always the same number of threads (8-10 still remain)
Command:
Output (before fail)
Edit: This may be due to memory requirements. the
parseVCF
contains the unclosedopen
mentioned in a different issue; however, this still occurs when fixed. The VCF file I'm using is 800MB.