lvclark / polyRAD

Genotype Calling with Uncertainty from Sequencing Data in Polyploids ๐ŸŒ๐Ÿ“๐Ÿฅ”๐Ÿ ๐Ÿฅ
24 stars 8 forks source link

Support multiploid populations #17

Closed lvclark closed 2 years ago

lvclark commented 3 years ago

For certain crops like banana and yam there seems to be a need for genotyping within populations of mixed ploidy. I have started some work on this, but it will involve some restructuring of RADdata objects, enough to probably justify incrementing the major version number of the software (going from polyRAD 1.3 to polyRAD 2.0, since objects will not be backwards-compatible).

I plan to support cases where ploidy is known ahead of time e.g. by flow cytometry, but not ploidy estimation from the data. The Hind/He statistic can help identify accessions where ploidy was misidentified, but keep in mind that it is also highly influenced by inbreeding and hybridization. See https://github.com/delomast/tripsAndDipR for a package that estimates ploidy from sequencing data.

The possiblePloidies slot will continue to represent different inheritance modes across the genome, so different markers can have different inheritance models. I plan to add a taxaPloidy slot, which will be an integer vector of individual ploidies, used as a multiplier on top of possiblePloidies.

There will need to be some substantial changes to the biparental mapping population pipeline, in particular how ploidies of the parents and progeny are specified. The priorProbPloidies slot can possibly be eliminated.