rafalab / bumphunter

bumphunter
15 stars 14 forks source link

Clarifying "distance" and "inside distance" #32

Open dsimps1993 opened 6 months ago

dsimps1993 commented 6 months ago

Hi there,

Really enjoying the tool so far, very concise annotations per region.

I had a question regarding the "distance" and "inside distance" columns. The regions I'm annotating are 100bp regions. The "distance" is the distance before the 5' end of the gene, while the inside distance is the distance past the 5' end of the gene. But for a feature that's inside a gene, eg an intron (which would be past the 5' end) I get a positive number for distance, and a positive number for the inside distance or even a negative number. So for example, I have these three coordinates within WASH7P:

chr1:17401-17500 - covers exon(s) chr1:17501-17600 - inside intron chr1:19201-19300 - inside intron

with a "distance" of 12070, 11970 and 10270. The inside distance is 0, 5 and -834 respectively. I'm trying to make sense of how big these distances are as they don't seem to make sense given the region is 100bp. At some point I'd like to filter out any regions that are say >2kb from a gene/feature, so it's important for me to understand this.

Thanks again, let me know if you need me to clarify or rephrase anything.

Cheers,

Daniel