ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
176 stars 63 forks source link

Different output fasta file names #138

Closed dcraheem closed 4 years ago

dcraheem commented 4 years ago

I have been assembling a mt genome of a species of marine mollusc from paired-end Illumina data. I am experimenting with using (a) a seed alone and (b) a seed and reference sequence together. The reference sequence and the data being assembled belong to two different species in the same genus. The seed is the first 1000 bp of the CO1 region of the reference sequence.

The assembled sequences outputted by Novoplasty are identical for both analyses, but the names of the output fasta files differ. When only the seed is used, the fasta file name begins with ‘Uncircularized_assemblies’ whereas when both the seed and reference sequence are used the output fast file name starts ‘Contigs_1’. Is this anything to worry about?

I have attached the log file, contigs_tmp file and fasta ouput for the two tests. Test_a.zip Test_b.zip

.

ndierckx commented 4 years ago

The reference is only used when to resolve ambiguous regions, so it could be that the reference solved the last assembly split, just before it circularises. so after 16010 bp the assembly had different extension possibilities that couldn't be resolved without the reference. This is something that can happen, but don't worry, it seems it got circularized.

Unless that region at the end is repetitive, than I should double check

dcraheem commented 4 years ago

Many thanks for the info.