phac-nml / cladeomatic

SNP population structure detection
Apache License 2.0
3 stars 0 forks source link

Genotyping results from k-mer and snp scheme differ #2

Open jrober84 opened 1 year ago

jrober84 commented 1 year ago

Testing has shown that some samples are not assigned any genotype based on the kmer scheme file but are correctly typed using the SNP scheme. This is likely the result of a missing partial assignment for these genotypes in the kmer scheme but the snp based scheme preserves this information.

jrober84 commented 1 year ago

In cases where there is a diagnostic position but there is a sequence missing a character in that position, will cause the kmer results to fail to produce a rule for that position and base combination. The genotyping tool when seeing a sequence with that alt state will call it a difference since that alt base is not in the list of allowed states. Kmer rule generation should be updated to fully assign rules to cases when a single state exists and the other is just missing data. Additionally the genotype tool needs to be updated to address edge case rules where both states should be allowed when a given genotype is not reported as having either state.