Open skeiii opened 3 days ago
Hi! You're absolutely right, it seems like the description was wrong. It seems to actually remove the 80% most heterogeneous sites. Like you deduced, to get rid of the 20% most heterogeneous sites, you need to run with f0.2.
I was not the original author of this script though, and feel weary of changing it without his knowledge. It's also in Perl, which is not really my fortee.
You can also check whether the total CHI2 score has reduced by applying
./alignment_pruner.pl --file input.faa --chi2_test | grep TOTAL
on the original alignment and the pruned alignment. It should have been reduced.
On another note, you may also be interested in this script, if you want to study compositional bias problems: https://github.com/Dalhousie-ICG/icg-shared-scripts/blob/main/visualize_compositional_bias.py
It is able to quantify per taxa which amino acids are enriched or depleted relative to the mean, and visualize it neatly in a bar chart, that is aligned to a phylogenetic tree.
Hi, I have been constructing an Alphaproteobacteria tree recently and using the alignment_pruner.pl to trim the 20% most heterogeneity sites.
According to the script,
And therefore, I use:
alignment_pruner_broCode.pl --file input.faa --chi2_prune f0.8 > pruned20_Alpa117.fasta
However, as compared to my original input file, instead of remain 80% of the sites, it only left with 20%. Original file: 18823 f0.8: 3765
I am wondering, am I using the wrong command for trimming the most compositional heterogeneity sites, or the description of the text in f# is not correct. To trim 20% of the sites, it should be f0.2.