phac-nml / biohansel

Rapidly subtype microbial genomes using single-nucleotide variant (SNV) subtyping schemes
Apache License 2.0
26 stars 7 forks source link

Unconfident QC Check not behaving as expected #27

Closed mgopez closed 6 years ago

mgopez commented 6 years ago

For QC Check 4 (Unconfident Results) bio_hansel is reporting that the downstream subtypes do not exist within the DataFrame. After manually looking at the detailed hansel results, I confirmed that the downstream subtypes' tiles exist within the result.

Within subtyper.py Lines with:

st.non_present_subtypes = [x for x in possible_downstream_subtypes if x not in df['subtype']]

should be replaced with something like:

st.non_present_subtypes = [x for x in possible_downstream_subtypes if not df['subtype'].str.contains(x).any()]

to properly check the DataFrame for the possible downstream subtypes.

peterk87 commented 6 years ago

Resolved with PR https://github.com/phac-nml/bio_hansel/pull/28