ksahlin / NGSpeciesID

Reference-free clustering and consensus forming of long-read amplicon sequencing
GNU General Public License v3.0
49 stars 14 forks source link

Repeating sequences in consensus #28

Closed josiah-liew closed 8 months ago

josiah-liew commented 11 months ago

Hello, thank you for creating this tool.

I wanted to ask if you could provide any ideas as to the presence of the following repeating sequences (highlighted here). Recently, we've noticed that the blast output has a low % coverage. Looking deeper we've begun to notice sections in the consensus that seem to repeat itself at the 5'/3' ends and are not part of the identified organisms in blast. Manually removing these repeats (and some additionally bases) significantly improves both query and identity.

cropped-sequences-consensus-ngspeciesid

ksahlin commented 10 months ago

Sorry, dropped this completely. Looks like it could be part of a barcode/primer that was not removed?

You can specify the parameter --primer_file [FILE] where FILE is a fasta file with custom primers that are removed after polishing.

HTH, Kristoffer

josiah-liew commented 8 months ago

Hi Kristoffer. Thank you! We definitely did specify the --primer_files. Something we've noticed is that the having too large a --s produces the following but adjusting the --s down removes the repeating sequences.

Thank you, Josiah