ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
175 stars 63 forks source link

Novoplasty Completed Successfully but Did Not Produce Contigs in Output File #231

Open meeranhussain opened 1 month ago

meeranhussain commented 1 month ago

I recently used Novoplasty to assemble the mitogenome from short read data of Microtonus aethiopoides ecotypes. Although the process completed successfully, it did not produce any contigs. I initially assembled the mitogenomes using Flye on Oxford Nanopore Technology (ONT) long read data for eight samples and obtained circularized genomes with sizes ranging from 29-32kb, which is unusually large for insect mitogenomes. To validate these results, I tried using Novoplasty on the short read data from the same samples. Despite the successful run, Novoplasty did not generate any contigs. I expected Novoplasty to produce contigs to compare with the Flye assembly results. I also wrote in Biostar (https://www.biostars.org/p/9599074/) to find answers for large mitogenomes but didn't find useful suggestions to validate. I would appreciate any suggestions!

image

image

ndierckx commented 1 month ago

Insects can have a long repetitive control region, so those lengths can be possible. Can you send me that extended log file? Seems the assembly was already 25 kbp, not sure what went wrong But it seems you have a long repetitive region so to have an accurate length of that region, best to rely on the Nanopore reads

meeranhussain commented 1 month ago

Hi, thanks for your reply. This genome appears to have long repetitive regions, but I am also concerned about potential misassemblies. I say this because I verified long-read mitogenome assembly method on Calliphora sp ONT data (whose mitogenome is typically 15-16kb). However, using Flye with this method resulted in a 32kb circular contig, which raises concerns about misassemblies. Any suggestions you have would be helpful. I also tried NOVOPlasty with a small k-mer value, but it still didn't produce a circular contig. I’ve attached the log file for your reference. log_extended_Maethio_13 (1).txt

ndierckx commented 1 month ago

At least this assembly outputted the assembled sequence, but it is probably not possible to accurately assemble the complete genome with just short reads. Do you also have long reads for this sample?

meeranhussain commented 1 month ago

Yes, I did try assemble using ONT reads but gave with long 32kb contig, with lot of repeats in control region, which I think is because of misassembly.

ndierckx commented 1 month ago

If you have short and long reads from the species (preferably same sample), it should be easy to assemble. I do have an unpublished hybrid assembler that I used for another user before: https://www.mdpi.com/1422-0067/24/4/3976 Can't share the code yet but maybe I could run it for you

I have a new long read assembler I just put online, which works much better than Flye. I will create a Docker in the future because Perl modules can be annoying to install on a cluster, but it doesn't need any memory so maybe you can run it on your desktop or laptop: https://github.com/ndierckx/NOVOLoci

meeranhussain commented 1 month ago

Thanks, that's so nice of you but I would first like to try your long read assembly (NOVOLoci), if it still doesn't work then will comeback to you for hybrid method