tskit-dev / pyslim

Tools for dealing with tree sequences coming to and from SLiM.
MIT License
27 stars 23 forks source link

SLiM's multispecies branch breaks pyslim-based treeseq tests #246

Closed bhaller closed 2 years ago

bhaller commented 2 years ago

Hi @petrelharp. This is not urgent, but it'd be nice to get it fixed before too long so that GitHub's CI resumes being useful for me during development. :-> The problem can be seen here:

https://github.com/MesserLab/SLiM/runs/5471463981?check_suite_focus=true

This is on the new multispecies branch I just made on SLiM. I've basically done two things on that branch so far:

The new branch does build and run, but I guess I broke something in the treeseq tests that get run on GitHub Actions, as you can see above. I think the likely culprit is that I changed the schemas a bit. The new schemas are here:

https://github.com/MesserLab/SLiM/blob/45c0e9bbad0552e4efdd9218aee92241506e5953/core/slim_globals.cpp#L1500

Of course it's a bit hard to see what changed, since these schemas are barely human-readable :->; but basically, besides changes to the text descriptions of a few things, the important change is that in the top-level metadata (gSLiM_tsk_metadata_schema), a new tick key was added, in addition to the existing generation key, and the required keys list now lists tick but not generation, since the expectation is that the tick is the timescale that pyslim will now really care about, whereas the generation in which the tree sequence was saved is expected to be more a point of idle curiosity. :-> Note that the slim_time of mutations is now in ticks, too, as the changed description for their metadata indicates.

SLiM (on the multispecies branch) now writes out both tick and generation, so generation should not be missing from the files it saves out; it just has tick as well. I'm therefore not sure why these changes would have broken SLiM's treeseq self-tests; my initial hypothesis is that pyslim is testing for equality of schemas in some way, and the new schemas are violating that test even though (since generation is still present) the metadata is actually still compliant with what pyslim otherwise expects. Anyhow, I don't know, that's just a guess. :->

I have not incremented the treeseq format version number; it remains at 0.7. SLiM just rolls with what it's given; if a tick metadata value is present it uses it, if it is not present then it assumes that generation is both the generation value and the tick value (since then the file must have been created by a pre-multispecies SLiM). So SLiM does not seem to need a version bump to work with these changes; but if one would make your life easier in pyslim, just let me know.

I expect to be working on this branch for several months, at least, before I merge it back into master, so although this is not public yet and won't be for a long time, it'd be great if you could do a little pyslim release that rolls in support for this new stuff so that CI starts working again for me. I can only think of one other change that I plan to make to the metadata, which is adding a key for the name of the species saved in the file (and probably a string description of the species, too?). If that change would again break pyslim due to schema equality checks, then I can easily make those changes now so that the new .trees format is, as far as I anticipate, finalized; and I can bump the version number to 0.8 if that would be helpful. Thanks!

bhaller commented 2 years ago

OK, in my latest commit to the multispecies branch I have added name and description properties to Species, which get persisted (with the same names) to the top-level metadata of the tree sequence file. That means the schema strings changed again; see slim_globals.cpp:1500 for the current schemas on the metadata branch. This is the last schema change presently on my radar as a result of the multispecies changes (but who knows, really). I decided to increment the treeseq file version from 0.7 to 0.8, even though SLiM doesn't really need to know; it's good as an official marker of these changes, I guess, and maybe pyslim wants to know.

So, I think things are ready to go, whenever you're ready to add support for these changes to pyslim. Thanks!

bhaller commented 2 years ago

The multispecies branch is coming along nicely; multispecies models are actually running, and I'm working on documentation changes now. There is still a lot to be done (with InteractionType, mainly), but it is getting to the point where people might start playing around with it. It would therefore be nice if this issue got fixed at some point. :->

bhaller commented 2 years ago

I wonder if we should increment file_version from 0.7 to 1.0, instead of to 0.8? At some point we're going to have to go to 1.0, and SLiM 4.0 is probably as definitive a point as any to say "OK, this is feeling like a real, solid release point". Of course, nobody actually pays any attention to this file version except us, so it doesn't really matter. :->

bhaller commented 2 years ago

Documentation changes are now mostly finished, so it would really be nice to have these tests be green when you have a chance. I just need to get some recipes written, and then it will be ready to announce as a prerelease that people can play around with. I still need to deal with the fallout for InteractionType to make it really work well, but I'll do that as a separate second pass.

Probably other minor changes will be needed to the tests as well, due to breaks with backward compatibility; notably, early() is now required rather than optional. Some other changes might also affect you; let me know if anything is unclear. Thanks!

bhaller commented 2 years ago

Hi @petrelharp. The time has come. All breaking changes planned for SLiM 4 are committed to the multispecies branch, and I'm hoping to announce an internal beta release ASAP to get some testing from early adopters (coauthors on the paper and such). SLiMgui should autofix most issues with the test scripts, hopefully; let me know if there are issues it doesn't autofix that you think it should be able to do. A revised manual for 4.0 is at http://www.benhaller.com/slim/SLiM_Manual_4.0.pdf.zip that documents the new APIs. So what we need now is a working pyslim that handles the new schemas, and revised tests that don't fail. :-> Of course let me know if you hit any problems! Thanks!

petrelharp commented 2 years ago

Fixed!