k-mer matched synthetic controls reported as unknowns

peterjc / thapbi-pict

Tree Health and Plant Biosecurity Initiative - Phytophthora ITS1 Classifier Tool

https://thapbi-pict.readthedocs.io/

MIT License

8 stars 2 forks source link

k-mer matched synthetic controls reported as unknowns #415

Open peterjc opened 2 years ago

peterjc commented 2 years ago

The k-mer matching in the prepare-reads stage (used for setting thresholds) is more relaxed than our strict classifiers (reasonable if the synthetic controls are truly random and not going to get confused with a biological sequence even with PCR oddities), meaning such reads will be reported with a species classification of "Unknown".

That seems... unhelpful. Perhaps special case anything pre-labelled during prepare-reads and skip classification, or use the pre-labelling as a fall back?

peterjc commented 2 years ago

On the other hand, having the k-mer matching done for synthetic controls on an extreme classifier like perfect match based identity classifier could also be surprising.