Closed svkazakov closed 7 years ago
Looks like a bug. Could you share all the files with me please so I can debug it?
Thanks, Martin
I have attached all working directory, is it enough? circSeq.1-3-main.zip
Sergey.
That's great, thanks. I'll have a look into it...
I agree, that's definitely a bug. The way circlator works is to trust that the spades assembly is correct, which is unfortunate in this case because it's put that kmer at each end of the contig. I will have a think about if I can do anything to catch this case. However, circlator was designed for long reads, not short illumina, and I have never seen this happen with long reads.
Yes, I know that Circlator was designed for long reads (usually PacBio or Oxford Nanopore), but in my case we have only two libraries with relatively long reads: 454 library with average read length 450 b.p. and Illumina with average read length 201 b.p. In both cases I got a common k-mer presented in final contig two times.
Even 454 reads are shorter than circlator was designed for - I wouldn't consider 450bp to be long. It's expecting reads that are kilobases in length.
Hi, I've run circlator on my own data, and it looks like it keeps a common k-mer at the ends of a contig, that was produced by SPAdes, after changing start position. Shouldn't circlator remove one of it?
So there was a common k-mer in a contig of input assembly, but the contig wasn't circular (see Bandage visualization in initial-assembly.png, + file 00.input_assembly.fasta) After local assembly made by SPAdes 3.7.1 in task 03.assembly, the contig becomes circular (see after-local-assembly.png, + attached file 03.assemble-contigs.fasta). And now it has an common k-mer at the ends of it with size of 127 nucs, for used k=127. (see common-k-mer.png). This contig NODE1* is then circularized using nucmer in task 04.merge but the file 04.merge.fasta still contains a common k-mer at the ends of the contig! At the final output this contig is reversed and its start position is changed, but rc-copy of the common k-mer still presents two times in the middle of the final contig.
It is a bug, isn't it?
circlator.log.txt
other-files.zip