Open lvclark opened 3 years ago
@lvclark can I work on this issue?
@nk183 Sure, thanks for doing this!
Going from polyRAD
to updog
refmat
, sizemat
, and ploidy
, to make it clear how the output can be passed to the arguments of multidog
.refmat
should come from the $alleleDepth
slot of the RADdata
object. You can use the OneAllelePerMarker
function with commonAlleles = TRUE
to remove the common allele of each locus to avoid duplicate computations in updog
. You will also need to transpose the matrix so alleles are in rows and taxa are in columns.sizemat
can either be derived from $alleleDepth + $antiAlleleDepth
or from $locDepth
. It needs to have the same dimensions and names as refmat
.ploidy
can come from $possiblePloidies
although there will probably need to be an argument to specify ploidy as well, since polyRAD
supports multiple ploidies and updog
does not.Going from updog
to polyRAD
multidog
outputs a list of two items. inddf
has most of the information that is needed:
snp
and ind
columns can be used to generate column names and row names, respectively, for alleleDepth
.ref
and size
columns can be used to generate the values for alleleDepth
.alleles2loc
, or the function could test if adjacent SNPs have identical size
across all individuals. This is needed not only to generate the alleles2loc
vector, but also to determine if the alleleDepth
matrix needs to be padded out with alternate alleles.alleleNucleotides
slot.locTable
can be generated with locus names as the row names. An option for the user to add in chromosome and position would be helpful.possiblePloidies
can be generated from the ploidy listed in snpdf
. It needs to be a list even though it only contains one item.contamRate
can be taken from median(multidog_out$snpdf$seq)
.RADdata
object, the $posteriorProb
slot should be added. It should be a list of length 1 containing a named three-dimensional array with values from the Pr_k
columns of inddf
. (You can run through some of the tutorials or examples to see what a RADdata
object should look like after genotype calling.)$possiblePloidies
slot should be copied to the $priorProbPloidies
slot.Misc
Documentation of the RADdata
class: https://github.com/lvclark/polyRAD/wiki/RADdata
From your profile I'm not sure if you're new to R... If it is a new language to you, be aware that loops are very slow because the code gets reinterpreted on each iteration. Many functions and operations can process an entire vector/matrix/array at once, however.
If you are new to bioinformatics, what we're trying to accomplish is genotype calling, which in a diploid is basically determining whether an individual is AA, Aa, or aa at a particular site (AKA locus/marker/SNP/gene) in the genome. What we have is a random sample of DNA sequence, where the locus has usually been sequenced multiple times. The "read depth" is the number of times we see the sequence for a given allele or a given locus. Using that read depth, along with information about the population of individuals being studied, we can use Bayesian statistics to get a posterior probability of each genotype AA, Aa, and aa being the true genotype. In polyploids it is more complicated, for example in a tetraploid you could have AAAA, AAAa, AAaa, Aaaa, or aaaa.
Updog only supports two alleles per locus. polyRAD supports any number of alleles per locus, but treats them as "pseudo-biallelic", where each allele is treated as a marker and each read either belongs to that allele or does not. Hence multiple markers in updog might correspond to a single marker in polyRAD.
It would be nice to have some convenience functions to convert between a
RADdata
object and the input and output ofupdog
. This would allow users to take advantage of the file import and export options in polyRAD, while performing the genotype calling itself in updog (more accurate than polyRAD in some cases but much slower).If you would like to add this feature and make a pull request, just comment here and I will give any help and guidance that I can. In particular see the
multidog
andformat_multidog
functions in updog. See also the checklist for pull requests.