tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
78 stars 12 forks source link

nanocompore keeps running forever #127

Closed lingolingolin closed 4 years ago

lingolingolin commented 4 years ago

Hi @a-slide @tleonardi ,

I have been running nanocompore on some small input files. It first ran 3 days before it was killed. I resubmitted the job and it has been running for another 4 days. It doesn't seem to make sense for it to take so much time processing such small files. I wonder if you could let me know what is going wrong?

The command I used is

nanocompore sampcomp -1 ctl.out_eventalign_collapse.tsv -2 trt.out_eventalign_collapse.tsv -f ref.fasta -o ctl_vs_treat --pvalue_thr 1 --logit --nthreads  6  --sequence_context  0 --comparison_methods GMM,KS,MW,TT --overwrite  

Thanks in advance.

a-slide commented 4 years ago

It seems that there is a multiprocessing sharing issue when the program hits a very high coverage transcript. Can you try again using the down-sampling option --downsample_high_coverage 5000 ?

a-slide commented 4 years ago

I think this is the same issue than https://github.com/tleonardi/nanocompore/issues/120

tleonardi commented 4 years ago

Hi @lingolingolin, I had a look at the files that you linked. I think the issue is exactly what @a-slide said: too high coverage that is not handled properly by our multiprocessing queues. You have ~50.000 and ~20.000 reads in the 2 samples respectively and they are mapped to the same reference, i.e. CoV-2 genomic RNA. You could try downsampling like @a-slide suggested. However, keep in mind that both Nanocompore and Nanopolish work at the trascript level, i.e. don't support "spliced" alignments such as those from sgRNAs.