proksee-project / proksee-cmd

Repo for Proksee Cmd Line Tools
Apache License 2.0
21 stars 2 forks source link

Merge dev/contamination into develop #26

Closed emarinier closed 3 years ago

emarinier commented 3 years ago

This pull request adds basic contamination detection. In particular, it checks for disagreement between the identified major species and some of the largest assembled contigs.

RefSeq Masher is used to identify contamination in contigs by first splitting multi-record FASTA files into single-record FASTA files.

This pull request addresses the following species identification and contamination issues: