In working on incorporating sequencing error into the estimation of genotype likelihoods, I made some fixes and changes that went into the main branch because they will be useful even if sequencing error is not ultimately incorporated. This pull request merges them back to the seqerror branch so I can continue working on it.
The internal function .nucdist was added, using code from MergeRareHaplotypes. This function calculates the number of mutations between haplotypes, and will be used by MergeRareHaplotypes, MergeIdenticalHaplotypes, and AddErrorMatrices.
MergeIdenticalHaplotypes was changed so that it not only merges alleles with completely identical alleleNucleotides, but also alleles that have zero distance between them based on IUPAC ambiguity codes (i.e..nucdist returns zero). Several import functions now run MergeIdenticalHaplotypes immediately after running MergeRareHaplotypes. This issue came up in the testing of AddErrorMatrices on the Miscanthus sacchariflorus dataset, when alleles "YTTTW" and "TTTTA" were both present in one locus generated by VCF2RADdata.
In working on incorporating sequencing error into the estimation of genotype likelihoods, I made some fixes and changes that went into the
main
branch because they will be useful even if sequencing error is not ultimately incorporated. This pull request merges them back to theseqerror
branch so I can continue working on it..nucdist
was added, using code fromMergeRareHaplotypes
. This function calculates the number of mutations between haplotypes, and will be used byMergeRareHaplotypes
,MergeIdenticalHaplotypes
, andAddErrorMatrices
.MergeIdenticalHaplotypes
was changed so that it not only merges alleles with completely identicalalleleNucleotides
, but also alleles that have zero distance between them based on IUPAC ambiguity codes (i.e..nucdist
returns zero). Several import functions now runMergeIdenticalHaplotypes
immediately after runningMergeRareHaplotypes
. This issue came up in the testing ofAddErrorMatrices
on the Miscanthus sacchariflorus dataset, when alleles "YTTTW" and "TTTTA" were both present in one locus generated byVCF2RADdata
.