kipoi / kipoiseq

Standard set of data-loaders for training and making predictions for DNA sequence-based models.
https://kipoi.org/kipoiseq/
MIT License
77 stars 13 forks source link

Implement `Variant` and `Interval` classes. Use them instead of `cyvcf2.Variant` and `pybedtools.Interval` #30

Closed Avsecz closed 5 years ago

Avsecz commented 5 years ago

Currently, the VariantSeqExtractor expects interval to be pybedtools.Interval and variants to be cyvcf2.Variant. Since both packages use Cython to define the classes it's super difficult to instantiate them in the case one uses another vcf parser or would like to manually introduce some variants not present in the vcf file. Hence, I suggest that we implement our own Interval and Variant classes and implement two conversion classmethods on them: from_pybedtools and from_cyvcf2.

s6juncheng commented 5 years ago

We have these in MMSplice, can be easily adapted here: https://github.com/gagneurlab/MMSplice/blob/1a59ee7198fb9397b8793017a615c32dfe430045/mmsplice/interval_tree.py#L10 https://github.com/gagneurlab/MMSplice/blob/1a59ee7198fb9397b8793017a615c32dfe430045/mmsplice/generic.py#L65

MuhammedHasan commented 5 years ago

49