oschwengers / asap

A scalable bacterial genome assembly, annotation and analysis pipeline
https://doi.org/10.1371/journal.pcbi.1007134
GNU General Public License v3.0
66 stars 18 forks source link

[help] pipeline stuck #12

Closed telatin closed 3 years ago

telatin commented 4 years ago

ASA3P (Docker) is running -- on 40 samples -- since July 09, and looks "stuck" at java -jar /asap/share/medusa/medusa.jar (running for more than 80 hours).

What should I check to ensure if it's okay?

oschwengers commented 4 years ago

Hi @telatin , from my personal experiences this issue with MeDuSa can occur on highly fragmented assemblies which unfortunately seem to never finish, sometimes. This issue can be exacerbated if many reference genomes are used. Could you maybe check the assembly in question? You could either keep waiting for it to finish sometime in the future or kill the MeDuSa process abandoning this sample for the sake of finishing the project. I'm sorry that I cannot help more but this is most certainly up to MeDuSa which sometimes has a hard task either to rearrange a large number of contigs within a fragmented assembly or to solve a highly complex graph of mapped contigs.

telatin commented 4 years ago

All the samples are E. coli and I'm using a single reference atmo. Out of 40 isolates, a few (~4) have a more fragmented assembly probably due the presence of a large plasmid with repeats. I'm trying running ASA3P on a batch of 4 isolates that assembled well, but MeDuSa seems to be time greedy also in this case, let's see :)

Thanks for the prompt reply.

file                                  num_seqs    sum_len  min_len    avg_len 

sequences/Escherichia_coli_MGI1.fasta      30  4,548,806      152  151,626.9  
sequences/Escherichia_coli_MGI2.fasta      34  4,567,456      155  134,336.9  
sequences/Escherichia_coli_MGL1.fasta      33  4,580,702      153  138,809.2  
sequences/Escherichia_coli_MGL2.fasta      38  4,581,184      155  120,557.5  
sequences/Escherichia_coli_MGO1.fasta      39  4,579,051      152  117,411.6  
sequences/Escherichia_coli_MGS1.fasta      35  4,580,704      341  130,877.3  
sequences/Escherichia_coli_MGS2.fasta      36  4,581,915      155  127,275.4  
[...]
sequences/Escherichia_coli_NPI1.fasta     169  4,998,571      150   29,577.3  
sequences/Escherichia_coli_NPI2.fasta     157  5,004,874      150   31,878.2  
sequences/Escherichia_coli_NPL2.fasta     281  5,151,173      150   18,331.6 
oschwengers commented 4 years ago

This is rather strange. Normally, MeDuSa should not take that long. Analyzing a public Ecoli project with heavily fragmented assemblies (400 to 700 contigs!) MeDuSa took only between 1 and 4 minutes.

telatin commented 4 years ago

I'll simply try again then. The small-batch finished btw, but the big lot was still ongoing.

Do you have any "logging" or "debugging" hint? (i.e. what to monitor, what to look at). I'm using a cloud VM and it could be the system not behaving...

oschwengers commented 4 years ago

According to your screenshot, you "achieve" full CPU utilization, hence I'd exclude any network/IO issues. That leaves my a bit puzzled. Maybe you could check your reference and make sure that it is not too far away from your isolates? That could be another issue.

oschwengers commented 3 years ago

Hello @telatin , any news on this? Can I help somehow? Otherwise I'd tend to close this issue.

telatin commented 3 years ago

Hi, thanks for checking. I'm still making tests and tring to remove samples with poor assembly quality. It could be interesting to implements some QC after assembly to stop ASAP. Medusa is definitely taking a lot of times also with "filtered" inputs so in case I have something new I'll let you know 👍 Best

oschwengers commented 3 years ago

Hi, yep, some sort of after-assembly QC might make sense. However, this is quite difficult to implement as one has to fiddle with the right thresholds. These most certainly will be very different for many users/species/assembly types. So currently, I don't know of an appropriate one-fits-all setup. Therefore, I tend to stick to the status quo. But I'm happy for any ideas/advises on this. Any progress with your isolates?

oschwengers commented 3 years ago

Hi @telatin , any progress or update regarding the MeDuSa issue? I'll close this for now, but please do not hesitate to re-open it in case this is still an issue. Best regards!