tskit-dev / msprime

Simulate genealogical trees and genomic sequence data using population genetic models
GNU General Public License v3.0
172 stars 84 forks source link

Add tags for file format versions #325

Closed ashander closed 6 years ago

ashander commented 6 years ago

If we can figure out a format for the tag so that it doesn't clash with the actual program version numbers, maybe this is a good idea? I'm happy to compile a list mapping commit hashes to file format versions.

As discussed in #322

jeromekelleher commented 6 years ago

Sounds like a great idea, thanks @ashander. I think a tag format of HDF5_FORMAT_V{major}.{minor} is probably good --- I you create the mapping I'll do the gitting!

ashander commented 6 years ago

Cool. Here's the mapping.txt If everything looks good (see link below for more details including snippets of diffs that correspond to these hashes) I think you can just paste git tag at the start of every line in mapping.txt and source it

More details here including the text files from which I made the mapping : https://github.com/jeromekelleher/msprime/compare/master...ashander:tag-file-format-version?expand=1

jeromekelleher commented 6 years ago

Impressive gitting @ashander! There's a slight issue here, in that we're tagging the first revision that supports a given file format rather than the last. It seems likely that the first revision will have some bugs in it, so perhaps we should change to the last instead? Is there way of capturing this with git?

ashander commented 6 years ago

Thanks! Oh but good point. I guess my map would only work if the last thing you did was change the version. More likely it's the first. If it was always the first thing changed we can use git tag {PREVIOUS VERSION} {SHA}^ on the commits where those lines were changed to get the commit just before the change.

For example: git tag hdf8.0 fc17dbd^ (using a shorter tag format for illustration) will get us

| * | | 5084839 Infrastucture and checks for multi-mutations.
| * | | fc17dbd Added mutation.parent to HDF5 & bumped version.
| * | | 0e655b4 (tag: hdf8.0) Minimal changes to support mutation.parent.
| * | | d59b6be Initial work on fixing simplify sites/mutations.
|/ / /  
* | |   299ddc9 Merge pull request #274 from jeromekelleher/final-simplify

Do you think this will work?

Note that even if it does using the hashes in the current mapping.txt won't quite work as there were two intermediate changes. If this seems promising I can add those and provide a new mapping

ashander commented 6 years ago

Oh. Looking at the graph in the previous comment it strikes me that it'd probably be better to get the first commit on master that is parent to a given change to the version lines. My git fu is too weak, currently, to do that

jeromekelleher commented 6 years ago

Oh. Looking at the graph in the previous comment it strikes me that it'd probably be better to get the first commit on master that is parent to a given change to the version lines. My git fu is too weak, currently, to do that

Yes, what we really want is the parent of the commit where this hash was merged. What bout using git log --merges to find the merge commit where the commit you've found from git blame was merged, and then take the other parent? This should give the last state of master just before the HDF5 version was changed. (I definitely wouldn't have partial implementations of a HDF5 version on master).

ashander commented 6 years ago

Ah nice. I think git log --merges on the commit that changes the HDF5 version will give the last commit on master. Not sure what the bit about 'take the other parent' means but don't think it's needed? I've pushed an update that uses that approach and sourcing mapping.txt

... results in these tags:

snippets of git commit log of 5 lines around each tag ``` | * | | 5084839 Infrastucture and checks for multi-mutations. | * | | fc17dbd (tag: drop) Added mutation.parent to HDF5 & bumped version. | * | | 0e655b4 Minimal changes to support mutation.parent. | * | | d59b6be Initial work on fixing simplify sites/mutations. |/ / / * | | 299ddc9 (tag: HDF5_FORMAT_V8.0) Merge pull request #274 from jeromekelleher/final-simplify |\ \ \ | |/ / |/| | | * | dd5bb8a Updated imports for no numpy case. | * | 83f4e36 FINALLY fixed segment leak in simplify!!! -- | * | 63c49bb High-level support for zero-edge tree sequences. | * | 8625c4e Changed initialisation for _msprime.TreeSequence. | * | f548cc2 Low-level changes for empty tree sequences. | * | b81c72d Added sequence_length attr to HDF5 & bumped version to 8.0. |/ / * | ca9c0c5 (tag: HDF5_FORMAT_V7.0) Merge pull request #271 from jeromekelleher/model-time-rescaling |\ \ | * | 3b609e7 Minor bugfixes. | * | cc5de32 Refactored lambda coalescents into main loop. | * | e3d0e09 Closed loophole allowing for 0 population sizes. | * | 392ee72 Properly catch errors in subprocess for verification. -- | * | 4126fda Initial triply linked tree implementation. | * | 82e7f78 Initial phase of edgeset->edge transition. | * | 6d346a1 Added method to construct holey tree sequences. | * | 4738d6e Initial support for tree seq's containing holes. |/ / * | 6310725 (tag: HDF5_FORMAT_V6.0) Merge pull request #218 from petrelharp/mutation_stats |\ \ | | | * 6d24917 (remove-edgesets-patch) Fixed various issues with tree sequences with gaps. | | | * f54bdad Triaged all C tests. Now passing. | | | * 91bd8e5 Full triage of issues. Python tests passing. | | | * 9045ae6 Modified test data text files. -- | * 976d552 Finished low-level mutation_type changes. | * 6e6b346 Initial low-level work for mutation types. |/ | * 0742b59 (docs-tidy) Cleanup TODOs |/ * 62659fb (tag: HDF5_FORMAT_V5.0) Merge pull request #157 from jeromekelleher/appveyor-tidy |\ | * 4854e12 Tidy up appveyor config and add badge. |/ * 979c698 Merge pull request #154 from jeromekelleher/appveyor |\ -- |/ * 73f46b2 Merge pull request #136 from jeromekelleher/file-format-update |\ | * 52d64ce Bumped HDF5 file format version to 5. |/ * a586646 (tag: HDF5_FORMAT_V4.0) Merge pull request #133 from jeromekelleher/tables-update |\ | * e29ce95 Removed the load_records() method. | * fe2e1d3 Experimental high-level interfaces for new model. | * 06c2906 Fixed various problems with name interface. | * c21022e Low-level implementation of node name. -- | * 4b6cf93 Added legacy dump & load for v4 format. | * 4a4ea94 Updated v2 legacy format tests. | * d2c6bdf Updated HDF5 tests for initial V4 support. | * a47ec9c Removed mutational state from HDF5 and set version to 4. |/ * 8bc2eae (tag: HDF5_FORMAT_V3.2) Merge pull request #121 from jeromekelleher/pin-pysam |\ | * 2e5a739 Pinned pysam version to avoid issues with 0.10.0 on py33 |/ * 73daa36 Merge pull request #105 from jeromekelleher/recurrent-mutations-2 |\ -- |/ * cc7efcf Merge pull request #63 from jeromekelleher/many-demes |\ | * ce489dd Support for up to 2**32 demes. |/ * d69c059 (tag: HDF5_FORMAT_V3.1) Merge pull request #58 from petrelharp/add_file_option |\ | * d6b6c08 Added tests and improved error handling. | * 737c5ca Fixed PEP8 violations causing CI failure. | * bbfef70 rejiggered argparse to allow -f flag | * f7f9a5f both options for inserting files -- * a0d453e Merge pull request #49 from jeromekelleher/finalise-file-format |\ | * 9a45c07 Documented v3 HDF5 file format. | * e760b6d Added back vestigial root attributes in HDF5 file. |/ * 7befdcf (tag: HDF5_FORMAT_V3.0) Merge pull request #38 from jeromekelleher/fix-demography-debugger |\ | * 3da4d1b Fix for demography debugger error. Closes #37. |/ * 7b1a040 Merge pull request #45 from jeromekelleher/ldcalc |\ -- | * 255d502 First steps on supporting non-binary trees. | * 602e247 Improvements for low-level test coverage. | * a168a47 Partial update of tree sequence for non-binary records. | * 26119fa Minimal changes to support simulator with nonbinary records. |/ * a26a227 (tag: HDF5_FORMAT_V2.1) Merge pull request #30 from jeromekelleher/fix-haplotypes-segfault |\ | * 52f7ff2 Added missing MSP_ERR_NO_MEMORY. * | d32b048 Merge pull request #28 from jeromekelleher/historical-samples |\ \ | |/ -- | * 77f3a8c Fixed high level test. | * d15d6e2 Partial high-level implementation. | * 8a0e28f Rough working version of algorithm. | * 291f230 Initial groundwork for historical samples. |/ * 7c507f3 (tag: HDF5_FORMAT_V2.0) Merge pull request #27 from jeromekelleher/test-save-imported_treeseq |\ | * 4cf3576 Improved test coverage on record import. |/ * a3da1b0 (tag: 0.3.2) Changelog for 0.3.2. * 829f25c Minor tidy up of tests and docs for load_txt. -- * 61d52d4 Added position to variants iterator. * b812b36 Added 'variants' command to msp cli. * 15a9c4e Python APIs for variant strings. * 6d66fd5 C library API for string variants. * d5e1283 Added citation text to docs and CLI help. * 04722d8 (tag: HDF5_FORMAT_V1.1, tag: HDF5_FORMAT_V1.0) Merge branch 'master' into develop |\ | * 6f7cde1 (tag: 0.2.0) Updated changelog ready for 0.2.0. * | dfe4758 Updated changelog ready for 0.2.0. |/ * b5c3a20 Added variable recombination example. -- * 8a5333a Fixed gross problems with docs. * ef22314 Added num_replicates interface to simulate. * e1b3de9 Implemented RNG architecture throughout. * 85a59e2 Removed random_seed from simulator and JSON config. * 91a39ae First pass at standalone RNG instance. * f42215e (tag: HDF5_FORMAT_V0.3) Merge branch 'master' into develop |\ | * b3848f2 (tag: 0.1.10) Updated changelog for 0.1.10 release. | * b59846f Issue #7. Workaround for seeding bug for small n. * | 973a971 Updated random seed. * | b44bcac Updated tests to use new API. -- * | d2db4ea Fixed bug in tree diff iterator. * | 68480b9 Partial update to high-level interface. * | 3df3bc3 Low-level tests working with new interface. * | c4660de Fixed bug in mutation generator. * | b4113fa Low-level changes for new sequence_length idea. * | 26e049d (tag: HDF5_FORMAT_V0.2) Merge branch 'develop' into real-coords-experiment |\ \ | * \ febc768 Merge branch 'master' into develop | |\ \ | | |/ | | * 8deb2b7 (tag: 0.1.9) Updated CHANGELOG for 0.1.9 release. -- * | 7d141c8 Intermediate point before trying mutation generator. * | 4f657c0 Translation to physical coords mostly working. * | dfde379 Added low-level plumbing for recomb map. * | 2e5d0cd Filled in some C level frameworks. * | c8e3fb3 Initial framework for genetic maps. * | 34ac742 (tag: HDF5_FORMAT_V0.1) Merge branch 'master' into variable-recombination2 |\ \ | |/ | * 005a79c (tag: 0.1.8) Minor updates to CHANGELOG. | * bcff53e Minor tidyups. | * 83b4871 Added the initial changelog. ```
jeromekelleher commented 6 years ago

I've hit an annoying snag with this plan unfortunately. The issue is that setuptools_scm expects all tags to be for versions, and will break when it hits (what it considers to be) a malformed version. This means that we can check out these tags, but then we can't actually build them (which is useless).

I think the best option is to put a "Revision history" section into the [http://msprime.readthedocs.io/en/stable/file-format.html](file format section) of the docs, and put the mapping you've derived into a table there, along with a quick explanation of what is it and what it's for. What do you think @ashander?

jeromekelleher commented 6 years ago

Version 9 should go in as e504abd3a44d7da23bc0ffca3fa23671454ff720

ashander commented 6 years ago

On Mon, Dec 11, 2017 at 05:13 Jerome Kelleher notifications@github.com wrote:

I've hit an annoying snag with this plan unfortunately. The issue is that setuptools_scm expects all tags to be for versions, and will break when it hits (what it considers to be) a malformed version. This means that we can check out these tags, but then we can't actually build them (which is useless).

Ah that is annoying!

I think the best option is to put a "Revision history" section into the [ http://msprime.readthedocs.io/en/stable/file-format.html](file format section) of the docs, and put the mapping you've derived into a table there, along with a quick explanation of what is it and what it's for. What do you think @ashander https://github.com/ashander?

Sounds good I can change my branch accordingly

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jeromekelleher/msprime/issues/325#issuecomment-350720464, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfLOLhoq6iNBEozaq-2n052ND0mAGeVks5s_SpkgaJpZM4Q4V-g .

-- -Jaime

jeromekelleher commented 6 years ago

Closed in #326