Closed apragsdale closed 5 years ago
The only way to do this at the moment would be to translate the coordinates on-demand @apragsdale. This should be straightforward enough to do though:
recomb_map = msprime.RecombinationMap() # load this from whereever
for var in ts.variants():
genetic_pos = recomb_map.physical_to_genetic(var.site.position)
It might not be the most efficient, but I doubt it would be much of a time-sink in practise.
We don't have any concept of genetic coordinates within the tree sequence at the moment, just physical. I guess we could add support in two different ways:
recombination_map
instance to a TreeSequence class, which if present, would allow us to support things like site.genetic_position
and tree.genetic_interval
.I'm not sure it's a common enough requirement to justify this complexity though --- it doesn't seem that hard to translate the coordinates on the fly?
Thanks Jerome! That makes sense - admittedly this is a pretty niche requirement, so it's certainly not high on the priority list. What I have in mind is computing various LD statistics, which are shaped by genetic instead of physical distances between loci, so if I want to get LD as a function of genetic distance over all pairs of SNPs, I need to check a lot of pairwise distances. Repeatedly calling the physical to genetic coordinate function can be avoided by using recomb_map.physical_to_genetic
, like you said, and then caching and saving/loading those genetic positions separately for future use.
I guess it's only really useful if you're looking at LD or IBD, which might not justify adding this support at the moment. I'll close this unless you think it would be a useful feature.
It's a good idea @apragsdale, but not pressing enough to follow up on right now I think.
Is there currently a nice way to record the genetic positions (instead of only physical positions (bp)) in tree sequences? Suppose I simulate with some recombination map (or have data encoded as a tree sequence, and want to store genetic positions of sites or tree intervals using some external map) - what is the best way to have this information accessible within the tree sequence object?
For example, if I iterate through either mutations variants on the tree, I can get the physical position right away by
m.position
orv.site.position
, or the interval of a tree in bp bytree.interval
. If I want to also be able to get, say,m.genetic_position
ortree.genetic_interval
, I imagine this might not be supported. Would I instead need to copy each table, add this information to the metadata of tables that record coordinates, and create a new tree sequence from those tables?