Closed grahamgower closed 3 years ago
I don't have a good feel for how the Python API is supposed to be used right now so :shrug: Hopefully I'll be more helpful once I've done the ms converter and got my hands dirty a bit.
Presumably, these methods should be internal to
get_demographic_events()
and not exposed in the Python API?
That's right, I think. The way things currently work, is you could 1) define demes and include the ancestors
for the child demes for a split event, e.g., or 2) define the demes without setting ancestors and then call split
, e.g., to set the ancestors and check that the time intersection is correct.
Some of the events, such as merge
, also have the effect of truncating parental demes to end at the time of the merger event, so you could do something like:
import demes
g = demes.DemeGraph(description="test", time_units="generations")
g.deme(id="parent1", initial_size=1000)
g.deme(id="parent2", initial_size=1000)
g.deme(id="child", initial_size=1000, start_time=10)
g.merge(parents=["parent1", "parent2"], child="child", proportions=[0.5, 0.5], time=10)
Then if you print(g.asdict_compact())
, you'll see that ancestors, proportions, and times have been set to create a valid merge event. I set that up when we had goals of allowing flexible incremental building of a DemeGraph
.
It might make the most sense to enforce that a DemeGraph
is constructed only with option 1, and that those demographic events are functions within get_demographic_events()
. I think we still want to have those functions somewhere, as lists of different demographic events are used in moments
/dadi
/fwdpy11
conversion, but it is a bit redundant and potentially confusing. On the other hand, the API isn't intended to be used by many non-developer users, so maybe it's ok to keep both methods.
Ok, do you mind if these methods get renamed to have a leading underscore then? This should remove them from showing in the docs, and indicate to those reading the code that they're for internal use only.
Yes, I don't mind, and let's go for that route if it means less confusion. So this means that we are encouraging most users of the API to take route (1), where ancestors
and proportions
are specified when you introduce the deme
?
If we change get_demographic_events
to return a dictionary of lists of event types, we should probably remove the splits
, branches
, mergers
, and admixtures
attributes from the DemeGraph
class, would you say? We should think a bit about whether we want to keep those or not..
Yes, I don't mind, and let's go for that route if it means less confusion. So this means that we are encouraging most users of the API to take route (1), where
ancestors
andproportions
are specified when you introduce thedeme
?
I think having one way to define ancestor/decendant relationships leads to a simpler API, which translates to less confusion for users.
If we change
get_demographic_events
to return a dictionary of lists of event types, we should probably remove thesplits
,branches
,mergers
, andadmixtures
attributes from theDemeGraph
class, would you say? We should think a bit about whether we want to keep those or not..
Yeah, that makes sense too. I'm all for keeping the functionality, if it reduces the amount of duplicated code elsewhere. But these list attributes do seem to be secondary data structures that can be built from an already-complete DemeGraph instance.
Perfect - agree on both points.
This change will end up breaking the moments
conversion, as well as the fwdpy11
... so I can tackle this PR if you haven't started already?
Thanks @apragsdale, happy to leave it to you.
I probably should have asked this before: are you planning to use the split()
, branch()
, admix()
, and merge()
methods to construct demographic models? If not, we might just remove them. (sorry, I thought they were useful for implementing the get_demographic_events()
method, but your changes in #101 suggest that's not the case). I think the essential points you had in #7 are now addressed without the need for these methods.
If you really do think they're useful, then we should keep them in the API, and clearly document them as an alternative/complementary way of constructing a demography. Probably we would need diagrams to indicate what each of these events are. But I also note that we don't have explicit keywords for these events in the yaml, so we should also consider if the divergence between yaml and python API is worth it.
It might not be worth keeping them. Essentially, we originally had a goal of using the API to dynamically build demographies, so it's similar to the add_epoch()
function, where you initialize a deme, maybe add some epochs to it, add another deme, then say "oh their relationship is a branch at this time" and each time you call something like branch()
it automatically updates the ancestors and times to be consistent.
It's a bit more chaotic than an approach where you draw a whole demography on a piece of paper and then implement it using just g.deme( )
and specifying proportions, ancestors, all times and epochs in one go. So in some sense it's "simpler" in that those split( )
, merge( )
, etc functions update deme attributes automatically, but also more confusing because I think it would be too easy for a user to confuse methods of defining a DemeGraph and not realize they're missing some attribute or something didn't get added properly.
So in short, it's an aesthetically pleasing way to define a demography to me, but probably will just cause confusion to most users. Maybe let's cut them out and simplify the code a great deal in the process.
Maybe let's cut them out and simplify the code a great deal in the process.
I'm not fully following the background here, but these words are always music to my ears!
[EDIT: fat fingers... I somehow submitted the issue before writing it]
The
DemeGraph
hassplit()
,branch()
,admix()
, andmerge()
, methods, but the semantics of how these could be used are not clear. For example, we have thesplit()
method.This suggests that the
parent
andchildren
must already exist in the deme graph (a ValueError will be raised for non-existent children inself[child]
). So the children and the parent were already added with calls todeme()
, but the ancestors field wasn't used? To me, these methods feel redundant at best, and confusing at worst.Presumably, these methods should be internal to
get_demographic_events()
and not exposed in the Python API?