Phased dataloader - Githubissues

Instead of having one prediction per variant, for phased data we know which variants are on the same DNA strand and therefore we can combine these in one single alternative sequence per chromosome copy.

Since this requires phased variants, an issue arises when these are mixed with non-phased variants. From previous discussions, there are basically three ways to handle them: 1) Ignore non-phased data, only throw a warning 2) Assign them to a random strand 3) Estimate strand with e.g. linkage disequilibrium

We concluded that the user should make sure that his data is correctly phased and therefore option 1) is the most straightforward solution.

kipoi / kipoiseq

Phased dataloader #78