openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
365 stars 68 forks source link

A method for getting length of the overlap between two loci #260

Open integratebio opened 2 years ago

integratebio commented 2 years ago

Hello developers, This is a feature request. I feel that it would have been very helpful if the Locus object had a method that would calculate the length of the overlap with another locus. For doing this, most of the time, I have to rely on Pyranges or rarely on a quick but rather rusty method (converting loci to ranges and then to sets and then calculating the length of the intersection between the sets). I always felt that if I could do these operations using pyensembl that would simplify overlap analysis so much.

For example, if there are two loci:

l1=Locus(contig='17', start=78852977, end=78925387, strand='-')
l2=Locus(contig='17', start=7552977, end=78905387, strand='-')

There could be a method that allows the calculation of the length of the overlap. The syntax could be something like

l1.overlap_length(l2) 
# returns the length of the overlap

Additionally, there could be simple methods to get the length of a locus (l1.length), and then that could be used to calculate the fraction of the length of a locus that is overlapping (the syntax could be something like l1.overlap_length_fraction(l2)).

I hope these simple add-ons which I feel are essential would be integrated into pyensembl at some point in time. 🤞