This is an amazing tool, and I ended up relying quite a lot on it due to its speed!
One improvement I would add is letting the user specify what a "pure base" is and what an "unknown" base is. This feature is inspired by two situations I run into often:
1) Many times "-" actually symbolizes a proper polymorphism, and for non-phylogenetic analysis users may want to keep them in their snp-aligment.
2) I often use IUPAC ambiguity codes in my alignments (M,R,W...), and in those positions with REF+IUPAC code, the column will be kept.
I think the change would be relatively easy to implement. I did change the src code (objects "is_unknown" and "is_pure" from alignment-file.c) before compiling it so it's suitable to my needs, but other users may want to benefit from this as well.
This is an amazing tool, and I ended up relying quite a lot on it due to its speed!
One improvement I would add is letting the user specify what a "pure base" is and what an "unknown" base is. This feature is inspired by two situations I run into often: 1) Many times "-" actually symbolizes a proper polymorphism, and for non-phylogenetic analysis users may want to keep them in their snp-aligment. 2) I often use IUPAC ambiguity codes in my alignments (M,R,W...), and in those positions with REF+IUPAC code, the column will be kept.
I think the change would be relatively easy to implement. I did change the src code (objects "is_unknown" and "is_pure" from alignment-file.c) before compiling it so it's suitable to my needs, but other users may want to benefit from this as well.