ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
174 stars 63 forks source link

Identification of possible contamination #174

Open ycuenot opened 3 years ago

ycuenot commented 3 years ago

Dear Mr Dierckxsens,

I am using NovoPlasty to assemble insect mitochondrial genome. I am wondering if we can identify contamination (for example during the DNA purification)? I used one of the assembled sequence with NovoPlasty to map by myself the reads again. I could see that there is a lot of polymorphism. How Novoplasty assemble when in a majority of position there are polymorphism because of a contamination? Thank you, Yves

ndierckx commented 3 years ago

Could be from different origin (sequencing errors, NUMTS, contamination) It is not because a read maps by an aligner, that is considered for assembly by NOVOPlasty. You can send a pic of the alignment, then I can have an idea about what kind of polymorphism you talk about

If the contamination is a small fraction, it will just ignore it. Are you interested in the contamination or only in the majority sequences?

ndierckx commented 3 years ago

Hi,

I don't see any screenshot, and best not to copy the complete log in the text box, you can attach it as a file

ycuenot commented 3 years ago

Thank you for the answer.

I send you a screenshot of the reads alignement mapped on the Novoplasty results.You will see the polymorphism. I send you the extended of the Novoplasty run too (I get two contigs that I could merged). I am afraid to not identify possible contaminations between samples (insects) that could occur during the DNA extraction.

Do you thnik that I can use the heteroplasmy option to identify polymorphism due to possible contaminations?

Thank you a lot, Yves Screen Shot 2021-06-14 at 10 12 05 AM log_extended_DIAG062.txt

ndierckx commented 3 years ago

Hi,

The heteroplasmy function could be a good way to find the source of the polymorphisms. there are some repetitive regions in the mitochondrial genome so I would not test on the complete genome, try with a sequence of a 5000 bp or so