sanger-pathogens / circlator

A tool to circularize genome assemblies
http://sanger-pathogens.github.io/circlator/
Other
230 stars 59 forks source link

Circlator seems to run forever #173

Open tnn111 opened 3 years ago

tnn111 commented 3 years ago

In a number of cases, circlator appears to run forever. It's spades-bwa that's doing it. Has anyone else noticed this? Is that a reason to run SPAdes 3.7?

I tried switching to using canu as an assembler, but it fails because it can't find a GFA file towards the end. Circlator is great as a tool, but it's difficult to get things to work at times.

schmittel commented 3 years ago

Same here. I ran Circlator on a PacBio assembly and it took about 2 hours to complete. Ran it again on another sample (of the same genome, same size/depth) and it's still running after 36 hours.. Also stuck on spades-bwa.

lauralwd commented 3 years ago

I'm running into this too. Spades-bwa is allocated multiple threads in my case but uses only one. I'm running spades 3.13.0. (I couldn't get 3.7.1 to run in the same environment as circlator) what version of spades are you running?

edit: not using spades 3.7.1 should not be an issue, see more info in this thread: https://github.com/sanger-pathogens/circlator/issues/72

Some extra details.

Long story short, this is likely a spades thing rather than a circulator thing.

Circlator does several assemblies, for several k-mers. Most assemblies in my case take just a couple of minutes, but then a certain one gets stuck seemingly forever. Looking closer to what spades-bwa is doing, I find that a proper sam file is created; find it in the 03.assemble*/tmp/corrector*/*/*.sam

The spades log file last two lines look like this for me:

[main] Real time: 0.924 sec; CPU: 0.393 sec
  0:00:00.998     4M / 4M    INFO   DatasetProcessor         (dataset_processor.cpp     : 173)   Running bwa mem ...:/home/laura/miniconda3/envs/ciclator/share/spades-3.12.0-2/bin/spades-bwa mem  -v 1 -t 6 /stor/azolla_mitochondrium/assembly/laura/circlator/subset2-dedup1_circlator_v1/03.assemble.tmp.spades.97.r3mz5_pn/misc/assembled_contigs.fasta /stor/azolla_mitochondrium/assembly/laura/circlator/subset2-dedup1_circlator_v1/02.bam2reads.fasta  > /stor/azolla_mitochondrium/assembly/laura/circlator/subset2-dedup1_circlator_v1/03.assemble.tmp.spades.97.r3mz5_pn/tmp/corrector_7cxabnsk/lib0_BOqZpn/tmp.sam

Waiting for this to finish seems pointless, the samfile looks fine but spades-bwa hangs for some reason. I'll try some different spades versions to see if that resolves the issue.

My ugly but pragmatic fix:

For me it seems spades-bwa only gets stuck at certain assemblies, but not others. Remember that circlator does several assemblies at several kmers and then chooses the one with the highest N50. You can just remove the kmer values for which the assemblies are problematic by either: