ksahlin / NGSpeciesID

Reference-free clustering and consensus forming of long-read amplicon sequencing
GNU General Public License v3.0
49 stars 14 forks source link

Repeating sequences in consensus #28

Closed josiah-liew closed 8 months ago

josiah-liew commented 11 months ago

Hello, thank you for creating this tool.

I wanted to ask if you could provide any ideas as to the presence of the following repeating sequences (highlighted here). Recently, we've noticed that the blast output has a low % coverage. Looking deeper we've begun to notice sections in the consensus that seem to repeat itself at the 5'/3' ends and are not part of the identified organisms in blast. Manually removing these repeats (and some additionally bases) significantly improves both query and identity.


ksahlin commented 10 months ago

Sorry, dropped this completely. Looks like it could be part of a barcode/primer that was not removed?

You can specify the parameter --primer_file [FILE] where FILE is a fasta file with custom primers that are removed after polishing.

HTH, Kristoffer

josiah-liew commented 8 months ago

Hi Kristoffer. Thank you! We definitely did specify the --primer_files. Something we've noticed is that the having too large a --s produces the following but adjusting the --s down removes the repeating sequences.

Thank you, Josiah