ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
170 stars 62 forks source link

How to generate Seed input file? #191

Closed meeranhussain closed 1 year ago

meeranhussain commented 2 years ago

Hi, I have sequenced 6 different mitochondria genome sample (paired-end) and reference sequence for each sample. I am trying to use NOVOplasty but I didn't understand what is "seed input" file. How to obtain a consensus for my samples.?

ndierckx commented 2 years ago

It is a short sequence (200-100 bp or so) to start the assembly, can be any mitochondrial sequence, doesn't need to be the same species. Just don't take a sequence in a repetitive or duplicated region. If your 6 samples are not too distant related species, you can take the same seed sequence for all of them. Since you have reference sequences, just take a sequence from them.

You can take even the whole reference as seed sequence, as it will just take the first few hunderd bps anyway, but I wouldn't recommend that, as assembled genome sequences often have complex regions at the ends, so better to take a sequence somewhere in the middle or so, or just a gene..

meeranhussain commented 2 years ago

Thank you for your reply. I had ran NOVOplasty for my 5 samples but out of 5 samples 3 samples had output with circularised fasta but for rest 2 samples it wasn't created instead there were multiple contigs created. Can you help me out on how to proceed as I am interested in gene prediction using MITOS in further analysis? Merged_contigs_Novo_plasty_B.txt log_extended_Novo_plasty_B.txt config_B.txt

ndierckx commented 2 years ago

You have 3 repetitive regions, it is not possible to know the exact length with short reads But there is no sequence missing, so the assembly is as good as it gets, all 3 contigs are circularly connected through repetitive regions. Many algorithms will collapse those regions, but I prefer to output separate contigs (you can always merge them manually)