Closed telatin closed 3 years ago
Hi @telatin , from my personal experiences this issue with MeDuSa can occur on highly fragmented assemblies which unfortunately seem to never finish, sometimes. This issue can be exacerbated if many reference genomes are used. Could you maybe check the assembly in question? You could either keep waiting for it to finish sometime in the future or kill the MeDuSa process abandoning this sample for the sake of finishing the project. I'm sorry that I cannot help more but this is most certainly up to MeDuSa which sometimes has a hard task either to rearrange a large number of contigs within a fragmented assembly or to solve a highly complex graph of mapped contigs.
All the samples are E. coli and I'm using a single reference atmo. Out of 40 isolates, a few (~4) have a more fragmented assembly probably due the presence of a large plasmid with repeats. I'm trying running ASA3P on a batch of 4 isolates that assembled well, but MeDuSa seems to be time greedy also in this case, let's see :)
Thanks for the prompt reply.
file num_seqs sum_len min_len avg_len
sequences/Escherichia_coli_MGI1.fasta 30 4,548,806 152 151,626.9
sequences/Escherichia_coli_MGI2.fasta 34 4,567,456 155 134,336.9
sequences/Escherichia_coli_MGL1.fasta 33 4,580,702 153 138,809.2
sequences/Escherichia_coli_MGL2.fasta 38 4,581,184 155 120,557.5
sequences/Escherichia_coli_MGO1.fasta 39 4,579,051 152 117,411.6
sequences/Escherichia_coli_MGS1.fasta 35 4,580,704 341 130,877.3
sequences/Escherichia_coli_MGS2.fasta 36 4,581,915 155 127,275.4
[...]
sequences/Escherichia_coli_NPI1.fasta 169 4,998,571 150 29,577.3
sequences/Escherichia_coli_NPI2.fasta 157 5,004,874 150 31,878.2
sequences/Escherichia_coli_NPL2.fasta 281 5,151,173 150 18,331.6
This is rather strange. Normally, MeDuSa should not take that long. Analyzing a public Ecoli project with heavily fragmented assemblies (400 to 700 contigs!) MeDuSa took only between 1 and 4 minutes.
I'll simply try again then. The small-batch finished btw, but the big lot was still ongoing.
Do you have any "logging" or "debugging" hint? (i.e. what to monitor, what to look at). I'm using a cloud VM and it could be the system not behaving...
According to your screenshot, you "achieve" full CPU utilization, hence I'd exclude any network/IO issues. That leaves my a bit puzzled. Maybe you could check your reference and make sure that it is not too far away from your isolates? That could be another issue.
Hello @telatin , any news on this? Can I help somehow? Otherwise I'd tend to close this issue.
Hi, thanks for checking. I'm still making tests and tring to remove samples with poor assembly quality. It could be interesting to implements some QC after assembly to stop ASAP. Medusa is definitely taking a lot of times also with "filtered" inputs so in case I have something new I'll let you know 👍 Best
Hi, yep, some sort of after-assembly QC might make sense. However, this is quite difficult to implement as one has to fiddle with the right
thresholds. These most certainly will be very different for many users/species/assembly types. So currently, I don't know of an appropriate one-fits-all setup. Therefore, I tend to stick to the status quo. But I'm happy for any ideas/advises on this.
Any progress with your isolates?
Hi @telatin ,
any progress or update regarding the MeDuSa
issue? I'll close this for now, but please do not hesitate to re-open it in case this is still an issue. Best regards!
ASA3P (Docker) is running -- on 40 samples -- since July 09, and looks "stuck" at
java -jar /asap/share/medusa/medusa.jar
(running for more than 80 hours).What should I check to ensure if it's okay?