ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
174 stars 63 forks source link

Redundancy In Assembly of Test Set for Chloroplasts #163

Closed Franny97 closed 3 years ago

Franny97 commented 3 years ago

Hello,

We have tried doing the assembly using the test set for chloroplasts that you provide on this GitHub repository and we analyzed the assembly. We generate 3 different contigs, which form 2 distinct assemblies.

When analyzing the k-mer distribution with kat, this is what we obtain: contig2Reads1-main mx spectra-cn

Do you have any idea why the assembly is redundant?

Thanks a lot

ndierckx commented 3 years ago

Hi,

It is explained in the wiki:

https://github.com/ndierckx/NOVOPlasty/wiki/Interpretation-&-post-processing

It is impossible to resolve the orientation of the region between the inverted repeat so it creates an additional contig.

Different output files are explained here: https://github.com/ndierckx/NOVOPlasty/wiki/Output-files

If you use a reference genome, it will resolve the contigs automatically and you will get one circular contig

Franny97 commented 3 years ago

Hi,

Thanks for your reply. The problem is that the k-mer distribution is like this for each contig generated. Then is it still normal to obtain certain k-mers that are repeated twice in the assembly only because of the inverted repeats?

Thanks for your quick reply.

ndierckx commented 3 years ago

of course, the inverted repeat is completely identical, many kmers will map to both Each contig has an inverted repeat, so that is normal

I don't really understand what the problem is, all cp genomes have inverted repeats?

Franny97 commented 3 years ago

No, all the assemblies generated with the test files provided, (so one for each inverted repeat possibility) have this k-mer distribution. Initially i thought it was a problem but it is normal to see this if we consider the inverted repeats.

Thanks for your help.