tskit-dev / what-is-an-arg-paper

Manuscript and code for the "What is an ARG?" paper
1 stars 8 forks source link

Clarification on eARGs vs gARGs - when are they not equivalent? #352

Closed hyanwong closed 10 months ago

hyanwong commented 11 months ago

Nick Barton's feedback includes:

I don’t find the terminology “eARG” vs “gARG” ideal. As I understand it, they are equivalent, provided that one includes enough information about each “event” - the real issue is in how much detail we describe the ARG, not whether it is described in terms of “events” or “genomes”.

We should make it clear how and what they may not be equivalent.

hyanwong commented 11 months ago

@jeromekelleher says:

I think we need an extra para at the end of the eArg section, driving home the point that our definition is by necessity loose because it encompasses different definitions, and also those definitions are themselves very imprecise. They just say "events" without stating what information you need to store about those events. The mathieson and scally definition is closest to a gArg but still differs in important ways, not least that they don't specify mechanisms for embedding trees.

That para would give us a place to nip a few different misunderstandings in the bud, I think

jeromekelleher commented 11 months ago

another point to make:

"Focusing on evolutionary events becomes cumbersome when we consider instances, for example, of sampling within multigenerational pedigrees or over time (e.g. SC2 in discussion). Does the inheritance of one sample from another, without any coalescence or recombination consititute an "event"? (etc)

hyanwong commented 11 months ago

So the two points to push back against Nick's argument ("As I understand it, a gARG and eARG are equivalent, provided that one includes enough information about each “event”) are

  1. You can losslessly convert any eARG to a gARG, but not necessarily the other way round (i.e. they aren't equivalent) because a. a gARG without recombination nodes can't be uniquely mapped to a single eARG. b. a gARG can contain information that isn't associated with a specific "event" (Jerome's point above).
  2. If you say "events" then you have to define what sort of events you allow (in Nick's terminology, what information you include - e.g. multiple breakpoints, delimitations on gene conversion tracts, multiple children via polytomies, and other things we might not have thought about). Unless you have an exhaustive list of event types, it is impossible to create a "general" eARG storage format, because the eARG is "process-dependent". We claim that you can bypass this by using the gARG to look at outcomes.

Nick would argue that point 1 is covered by his " as long as you include enough information" clause. But (a) often you can't or don't want to include this information and I can't see a clear way to encode this as a set of fuzzy events and (b) you still fall foul of point 2.

jeromekelleher commented 10 months ago

I think we've probably closed this down now.