Open grahamgower opened 3 years ago
I think the main principles should be to avoid breaking people's code, where possible, and to facilitate reproducibility. It should always be possible to pip install
a given version of stdpopsim, run some code, and get the same answer as you would have at the time of release. This means, IMO, that we need to store these old maps forever.
I agree we need a new genetic map with id HapMapII_GRCh38. I guess the right thing to do in the code is to put in a FutureWarning that HapMapII_GRCh37 will be removed from the API at some point (in, say, a year). Also, emitting a warning that this map does not agree with the current genome build for #701 , would be good.
Where are you getting the build 38 maps from @grahamgower ?
We lifted them over: see code here and #691
Thanks @jeromekelleher . I guess some documentation describing this would be helpful (it took a while to find older lifted over files for elsewhere: it would be nice to make this easy for others to find and use). I'm happy to help here.
Possibly @silastittes
We currently have a genetic map for HomSap with id
HapMapII_GRCh37
. But we've now moved to usingGRCh38
coordinates for chromosome lengths. And actually, these older HapMap genetic maps now extend beyond the new chromosome lengths. We don't want to use mismatched coordinate spaces anyway, so the HapMap maps have been lifted over to h38 (but still needs to be uploaded to AWS I think?). So, presumably we want to introduce a new genetic map with idHapMapII_GRCh38
. We then either leave the old id present and doing the wrong thing (#701), or we deprecate/remove the old h37 ID (which is an API-breaking change). And then if folks really want to match h37-mapped data, they would need to use stdpopsim 0.1.2 (probably already necessary, without further changes in stdpopsim HEAD). This is getting a bit messy. Thoughts?