ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
176 stars 63 forks source link

Seed sequence for plant mitochondrial genome #151

Open bioramg opened 4 years ago

bioramg commented 4 years ago

Hello, I would like to assemble the plant mitochondrial genomes using Novoplasty. I read your recently published article, mentioned that RuBP used as a seed sequence for cp genome. But I would like to know which gene could be used as a seed sequence for plant mitochondrial genome assembly? Also, can I use multiple mitochondrial genes as a seed sequence for this denovo assembly? I gave 295 Kb size contig as a seed for plant mitochondrial assembly and obtained 430 Kb size of the contig in the output file. But the 430 kb contig is not similar to the input seed contig. Which one is correct? Thank you.

ndierckx commented 4 years ago

If you have an assembly that you know is correct, you can use that as seed, but then you need to put the option "extend seed directly" to yes. It is important that ends are correct, if you doubt I would clip 200 bp or s (often more mistakes at the end of assembled contigs)

If you use this seed without "extend seed directly" it will only use the first 200 bp or so because it uses it to extract one seed from the dataset

So it is best to use a short seed from a region that is not in the chloroplast genome

Have you already assembled the chloroplast genome?

bioramg commented 3 years ago

Thank you for your response. Yes. I assembled the chloroplast genome. I am having 6 contigs assembled by SPADes assembler. These six contigs are having all mitochondrial genes. So, shall I extract only one mitochondrial gene from the contig and can use it as a seed input?

ndierckx commented 3 years ago

Yes you could do that and don't forget to add the fasta file of the chloroplast sequence in the config file

bioramg commented 3 years ago

Yes. I have included chloroplast genome sequence and cox2 gene as a seed sequence for mitochondrial genome assembly. But I could not obtain consistent results. Should I improve or modify some other parameters?

ndierckx commented 3 years ago

You can send the log file, I can check the parameters. But plant mitochondrial genomes are very hard to assemble, even with long reads it mostly fails

bioramg commented 3 years ago

Thank you. But Unfortunately deleted cox2 gene-seed file data. If you need it, I can re-run and give you a log file.

I am herewith enclosing the log file of 295K contig.

log_cmo_mt.txt

ndierckx commented 3 years ago

You do get quite large contigs, it is better than most cases I saw. You can try different assemblers but I don't think you will succeed for a circular genome..