SISRS identifies sites which are fixed within and variable among taxa. It also notes whether the variable sites represent singletons or parsimony informative sites, outputting all among-taxa variable sites (including singletons) to alignment.nex and only PI sites (no singletons) to alignment_pi.nex.
Downstream from that step, however, changeMissing and loci were using the alignment.nex information, which contains many variable sites which are singletons (50% of sites in my primate analysis, 11M/22M sites). The effect of including such sites downstream is not immediately clear, and so I wanted to run some tests.
In order to test the effects of singleton sites on different downstream analyses, I added a bit of code to output data from alignment_pi.nex just as it does in alignment.nex. The alignment_pi.nex data is output to a separate sub-directory and the base code for SISRS still uses the alignment.nex data to run changeMissing and loci (no changes to how SISRS was running before).
I will use these data to test whether contigs containing different ratios of singletons/PI sites produce more or less accurate trees.
Note: Originally I had made changes such that SISRS used alignment_pi.nex as default, but I went back and reverted it, instead maintaining native SISRS behavior while merely collected PI data on the side.
SISRS identifies sites which are fixed within and variable among taxa. It also notes whether the variable sites represent singletons or parsimony informative sites, outputting all among-taxa variable sites (including singletons) to alignment.nex and only PI sites (no singletons) to alignment_pi.nex.
Downstream from that step, however, changeMissing and loci were using the alignment.nex information, which contains many variable sites which are singletons (50% of sites in my primate analysis, 11M/22M sites). The effect of including such sites downstream is not immediately clear, and so I wanted to run some tests.
In order to test the effects of singleton sites on different downstream analyses, I added a bit of code to output data from alignment_pi.nex just as it does in alignment.nex. The alignment_pi.nex data is output to a separate sub-directory and the base code for SISRS still uses the alignment.nex data to run changeMissing and loci (no changes to how SISRS was running before).
I will use these data to test whether contigs containing different ratios of singletons/PI sites produce more or less accurate trees.
Note: Originally I had made changes such that SISRS used alignment_pi.nex as default, but I went back and reverted it, instead maintaining native SISRS behavior while merely collected PI data on the side.