This pull request improves species and contamination estimation by:
Having higher requirements for estimating 'all' species. Now: 1x multiplicity, 5% shared k-mers, 0x coverage.
Calling Mash directory instead of RefSeq Masher. This allows Mash to process multiple files simultaneously (RSM forces each file to be run separately).
RefSeq Masher's Mash database and Mash output processing are still used to assign species names. This now involves calling the appropriate RSM functions directly.
Contamination checking is done on up to 5 'chunks' instead of the 5 largest contigs. Each chunk should contain approximately 1/5 of the assembled contigs. This should help reduce false positives for contamination.
Clarified some variable names in contamination_handler.py that were called "estimations" when they were actually Species.
This pull request improves species and contamination estimation by: