hallamlab / TreeSAPP

A Python package for gene-centric taxonomic and functional classification using phylogenetic placement
GNU General Public License v3.0
25 stars 4 forks source link

Replace BMGE with ClipKit #67

Open cmorganl opened 3 years ago

cmorganl commented 3 years ago

ClipKIT is a new MSA-trimming Python package. The authors indicate the trimmed MSAs generated by ClipKIT are more "desireable" (combined RF distance and bipartition supports) than those from competing tools, including BMGE.

Using ClipKit instead of BMGE would also clean up the installation process, by not having to package the BMGE.jar file with TreeSAPP. It could instead be installed using pip or conda.

cmorganl commented 2 years ago

ClipKit parameters and settings have been benchmarked using treesapp evaluate. The following code is used to calculate a single error value for the classifications across all taxonomic ranks, weighted by the number of ranks to the correct taxon (i.e. taxonomic distance):

for f in *_evaluate*/final_outputs/clade_exclusion_performance.tsv
    do
    echo $f
    cat $f | awk '{sum+=$5*$7;} END {print sum;}'
done

The parameter set with the lowest score will be used as the default.