kipoi / kipoiseq

Standard set of data-loaders for training and making predictions for DNA sequence-based models.
https://kipoi.org/kipoiseq/
MIT License
77 stars 13 forks source link

Phased dataloader #78

Open Hoeze opened 4 years ago

Hoeze commented 4 years ago

Instead of having one prediction per variant, for phased data we know which variants are on the same DNA strand and therefore we can combine these in one single alternative sequence per chromosome copy.

Since this requires phased variants, an issue arises when these are mixed with non-phased variants. From previous discussions, there are basically three ways to handle them: 1) Ignore non-phased data, only throw a warning 2) Assign them to a random strand 3) Estimate strand with e.g. linkage disequilibrium

We concluded that the user should make sure that his data is correctly phased and therefore option 1) is the most straightforward solution.