torognes / vsearch

Versatile open-source tool for microbiome analysis
Other
643 stars 123 forks source link

chimera detection: variable number of chunks #501

Closed frederic-mahe closed 1 year ago

frederic-mahe commented 1 year ago

As of now, a given query sequence is split into chunks (or parts) and each chunk is compared with potential parents. The number of chunks is fixed in vsearch's uchime commands and set to four:

https://github.com/torognes/vsearch/blob/3ab323616159ea39598d7ddd1dc9fa38864ee3a1/src/chimera.cc#L74

with the availability of long-read sequencing platforms, it might be interesting to either turn parts into a user-controlled parameter, or to make parts vary according to the length of the query.

torognes commented 1 year ago

I've actually experimented with changing parts when analysing whole rRNA + ITS sequences (~2800bp) and increased parts up to 20 with clear effect. I am also working to change the code to be able to detect chimeras with 3 or more parents. This requires a bit more coding, though.