popsim-consortium / stdpopsim

A library of standard population genetic models
GNU General Public License v3.0
121 stars 86 forks source link

deprecation process for old genetic maps/ids #719

Open grahamgower opened 3 years ago

grahamgower commented 3 years ago

We currently have a genetic map for HomSap with id HapMapII_GRCh37. But we've now moved to using GRCh38 coordinates for chromosome lengths. And actually, these older HapMap genetic maps now extend beyond the new chromosome lengths. We don't want to use mismatched coordinate spaces anyway, so the HapMap maps have been lifted over to h38 (but still needs to be uploaded to AWS I think?). So, presumably we want to introduce a new genetic map with id HapMapII_GRCh38. We then either leave the old id present and doing the wrong thing (#701), or we deprecate/remove the old h37 ID (which is an API-breaking change). And then if folks really want to match h37-mapped data, they would need to use stdpopsim 0.1.2 (probably already necessary, without further changes in stdpopsim HEAD). This is getting a bit messy. Thoughts?

jeromekelleher commented 3 years ago

I think the main principles should be to avoid breaking people's code, where possible, and to facilitate reproducibility. It should always be possible to pip install a given version of stdpopsim, run some code, and get the same answer as you would have at the time of release. This means, IMO, that we need to store these old maps forever.

I agree we need a new genetic map with id HapMapII_GRCh38. I guess the right thing to do in the code is to put in a FutureWarning that HapMapII_GRCh37 will be removed from the API at some point (in, say, a year). Also, emitting a warning that this map does not agree with the current genome build for #701 , would be good.

hyanwong commented 3 years ago

Where are you getting the build 38 maps from @grahamgower ?

jeromekelleher commented 3 years ago

We lifted them over: see code here and #691

hyanwong commented 3 years ago

Thanks @jeromekelleher . I guess some documentation describing this would be helpful (it took a while to find older lifted over files for elsewhere: it would be nice to make this easy for others to find and use). I'm happy to help here.