off context usage for general amplicons

splaisan commented 4 months ago

Hi, I will soon have ONT amplicon sequences (without UMIs) which will be mixtures of different amplification products of the same template mixture (differential exon usage) The sequences will be highly similar (end with the F and R primers) but will differ in length due to internal splicing. I have a hard time finding the right tool for this. Can LotuS help me cluster the reads by highest sequence similarity into separate bins and return best representative sequences and counts for each bin? Thanks in advance for your guidance Stephane

hildebra commented 4 months ago

Hey Stephane, no sorry, we made the conscious decision not to support the current generation of ONT due to the high error rate that is probably often higher than the classical 97% id similarity. That being said, you could certainly run lotus2 and just extremely lower the error tolerance in sdm_opt.txt (one of them). However, if the sequences are not prefiltered to only contain primer + 16S/18S, this will likely also fail. best, Falk

splaisan commented 4 months ago

Thanks Falk, this is not 16S but a totally unrelated amplicon. I will look in the direction of MMseq2 and other clustering tools cheers S

hildebra / lotus2

off context usage for general amplicons #63