rbturnbull / orthoflow

Orthoflow is a workflow for phylogenetic inference of genome-scale datasets of protein-coding genes.
https://rbturnbull.github.io/orthoflow/
Apache License 2.0
10 stars 2 forks source link

trimming step missing from workflow #35

Closed JLSteenwyk closed 2 years ago

JLSteenwyk commented 2 years ago

After aligning multiple sequence alignments, sites with low phylogenetic information (e.g., gap-rich sites) should be removed. Following the workflow diagram, this is step 8, which requires ClipKIT. To run ClipKIT, the appropriate command is: clipkit <input>

An additional argument, -o, can be used to specify the name of the output file. If possible, I think it would be great to save the stdout from trimming each alignment. This will provide helpful information -- e.g., how much of the alignment was removed during trimming.

For nucleotide sequences, they can be trimmed after using the thread_dna function in PhyKIT. In other words, the codon-based alignment can be trimmed.

rbturnbull commented 2 years ago

complete