popsim-consortium / demes-python

Tools for describing and manipulating demographic models.
https://popsim-consortium.github.io/demes-docs/
ISC License
19 stars 6 forks source link

DemeGraph methods: split(), branch(), admix(), and merge() #97

Closed grahamgower closed 3 years ago

grahamgower commented 3 years ago

[EDIT: fat fingers... I somehow submitted the issue before writing it]

The DemeGraph has split(), branch(), admix(), and merge(), methods, but the semantics of how these could be used are not clear. For example, we have the split() method.

    def split(self, *, parent, children, time):
        """
        Add split event at a given time. Split events involve a parental deme
        whose end time equals the start time of all children demes.

        :param str parent: The ID of the parent deme.
        :param children: A list of IDs of the descendant demes.
        :type children: list of str
        :param float time: The time at which split occurs.
        """
        for child in children:
            # check parent/children relationship and end/start times
            if child == parent:
                raise ValueError("cannot be ancestor of own deme")
            if self[parent].end_time != self[child].start_time:
                raise ValueError(
                    f"{parent} and {child} must have matching end and start times"
                )
            # the ancestor of each child population is set
            self[child].ancestors = [parent]
        self.splits.append(Split(parent=parent, children=children, time=time))

This suggests that the parent and children must already exist in the deme graph (a ValueError will be raised for non-existent children in self[child]). So the children and the parent were already added with calls to deme(), but the ancestors field wasn't used? To me, these methods feel redundant at best, and confusing at worst.

Presumably, these methods should be internal to get_demographic_events() and not exposed in the Python API?

jeromekelleher commented 3 years ago

I don't have a good feel for how the Python API is supposed to be used right now so :shrug: Hopefully I'll be more helpful once I've done the ms converter and got my hands dirty a bit.

apragsdale commented 3 years ago

Presumably, these methods should be internal to get_demographic_events() and not exposed in the Python API?

That's right, I think. The way things currently work, is you could 1) define demes and include the ancestors for the child demes for a split event, e.g., or 2) define the demes without setting ancestors and then call split, e.g., to set the ancestors and check that the time intersection is correct.

Some of the events, such as merge, also have the effect of truncating parental demes to end at the time of the merger event, so you could do something like:

import demes
g = demes.DemeGraph(description="test", time_units="generations")
g.deme(id="parent1", initial_size=1000)
g.deme(id="parent2", initial_size=1000)
g.deme(id="child", initial_size=1000, start_time=10)
g.merge(parents=["parent1", "parent2"], child="child", proportions=[0.5, 0.5], time=10)

Then if you print(g.asdict_compact()), you'll see that ancestors, proportions, and times have been set to create a valid merge event. I set that up when we had goals of allowing flexible incremental building of a DemeGraph.

It might make the most sense to enforce that a DemeGraph is constructed only with option 1, and that those demographic events are functions within get_demographic_events(). I think we still want to have those functions somewhere, as lists of different demographic events are used in moments/dadi/fwdpy11 conversion, but it is a bit redundant and potentially confusing. On the other hand, the API isn't intended to be used by many non-developer users, so maybe it's ok to keep both methods.

grahamgower commented 3 years ago

Ok, do you mind if these methods get renamed to have a leading underscore then? This should remove them from showing in the docs, and indicate to those reading the code that they're for internal use only.

apragsdale commented 3 years ago

Yes, I don't mind, and let's go for that route if it means less confusion. So this means that we are encouraging most users of the API to take route (1), where ancestors and proportions are specified when you introduce the deme?

If we change get_demographic_events to return a dictionary of lists of event types, we should probably remove the splits, branches, mergers, and admixtures attributes from the DemeGraph class, would you say? We should think a bit about whether we want to keep those or not..

grahamgower commented 3 years ago

Yes, I don't mind, and let's go for that route if it means less confusion. So this means that we are encouraging most users of the API to take route (1), where ancestors and proportions are specified when you introduce the deme?

I think having one way to define ancestor/decendant relationships leads to a simpler API, which translates to less confusion for users.

If we change get_demographic_events to return a dictionary of lists of event types, we should probably remove the splits, branches, mergers, and admixtures attributes from the DemeGraph class, would you say? We should think a bit about whether we want to keep those or not..

Yeah, that makes sense too. I'm all for keeping the functionality, if it reduces the amount of duplicated code elsewhere. But these list attributes do seem to be secondary data structures that can be built from an already-complete DemeGraph instance.

apragsdale commented 3 years ago

Perfect - agree on both points.

This change will end up breaking the moments conversion, as well as the fwdpy11... so I can tackle this PR if you haven't started already?

grahamgower commented 3 years ago

Thanks @apragsdale, happy to leave it to you.

grahamgower commented 3 years ago

I probably should have asked this before: are you planning to use the split(), branch(), admix(), and merge() methods to construct demographic models? If not, we might just remove them. (sorry, I thought they were useful for implementing the get_demographic_events() method, but your changes in #101 suggest that's not the case). I think the essential points you had in #7 are now addressed without the need for these methods.

If you really do think they're useful, then we should keep them in the API, and clearly document them as an alternative/complementary way of constructing a demography. Probably we would need diagrams to indicate what each of these events are. But I also note that we don't have explicit keywords for these events in the yaml, so we should also consider if the divergence between yaml and python API is worth it.

apragsdale commented 3 years ago

It might not be worth keeping them. Essentially, we originally had a goal of using the API to dynamically build demographies, so it's similar to the add_epoch() function, where you initialize a deme, maybe add some epochs to it, add another deme, then say "oh their relationship is a branch at this time" and each time you call something like branch() it automatically updates the ancestors and times to be consistent.

It's a bit more chaotic than an approach where you draw a whole demography on a piece of paper and then implement it using just g.deme( ) and specifying proportions, ancestors, all times and epochs in one go. So in some sense it's "simpler" in that those split( ), merge( ), etc functions update deme attributes automatically, but also more confusing because I think it would be too easy for a user to confuse methods of defining a DemeGraph and not realize they're missing some attribute or something didn't get added properly.

So in short, it's an aesthetically pleasing way to define a demography to me, but probably will just cause confusion to most users. Maybe let's cut them out and simplify the code a great deal in the process.

jeromekelleher commented 3 years ago

Maybe let's cut them out and simplify the code a great deal in the process.

I'm not fully following the background here, but these words are always music to my ears!