proksee-project / proksee-cmd

Repo for Proksee Cmd Line Tools
Apache License 2.0
21 stars 2 forks source link

Use Mash Screen to Identify Species #21

Closed emarinier closed 3 years ago

emarinier commented 3 years ago

Currently, it is possible for the species detection to miss major contamination, because it is only showing the top five species in the read set. If there are at least five hits to strains for the same species, it is possible that the contaminating species does not appear.

We should use 'mash screen', probably through the medium of 'refseq_masher contains' to identify species present in the read set and flag for possible contamination. This will involve rewriting parts of organism_detection.py.

Consider leveraging:

emarinier commented 3 years ago

Resolved in #24.