Closed emarinier closed 2 years ago
@sciguy I think I've made all the requested changes, and I've also changed the JSON file to be camel case. Take a look and let me know if there's anything else missing you can think of, or anything you think needs to be changed.
"Older" Changes
Recent Additions
evaluate
andassemble
to always run both the heuristic species-based evaluation and the NCBI RefSeq exclusion criteria evaluation.Description of Evaluation Process
There are three evaluations performed by Proksee [command/assemble/evaluate]: a species-based heuristic evaluation; an NCBI RefSeq exclusion criteria-based heuristic evaluation, and a species-based machine learning evaluation.
The species-based heuristic evaluation works by comparing common assembly quality metrics (number of contigs, length, N50, and L50) against a database of curated assembly quality metrics derived from NCBI RefSeq assemblies. If the species is determined with confidence, the evaluation will check to see if each quality metric of the Proksee-pipeline generated assembly falls within an acceptable percentile range when compared to other curated assemblies of the same species.
The NCBI RefSeq exclusion criteria-based heuristic evaluation works be comparing common assembly quality metrics (number of contigs, length, N50, and L50) against RefSeq's exclusion criteria. That is, if the assembly metrics don't meet the thresholds specified by RefSeq, then they will not be accepted into RefSeq.
The species-based machine learning evaluation performs very similarly to our species-based heuristic evaluation, except the assembly quality metrics are considered simultaneously in a machine learning context, rather than evaluating each metric individually.
Won't Do
Example JSON Files
(Updated 2022-10-13): Please note that they're saved as
txt
in order to upload to GitHub, but they're allassembly_info.json
files.staph_aureus.txt ERR234657.txt campy.txt