tskit-dev / what-is-an-arg-paper

Manuscript and code for the "What is an ARG?" paper
1 stars 8 forks source link

New section to describe example ARGs #393

Closed jeromekelleher closed 10 months ago

jeromekelleher commented 11 months ago

We only discuss the inferred ARGs figure in one paragraph at the moment, and it's used as a sort of side-effect of the discussion of recombination precision. We want a new section, to describe and discuss.

We can call it "Example inferred ARGs" for now, until we think of a better title.

jeromekelleher commented 11 months ago

I think this is a good potential lead-in:

The substantial scalability gains made by recent inference methods have in part been due to inferring lower levels of precision about recombination. However, the properties of the ARGs are not well understood, and (in particular) Relate and tsinfer have been mischaracterised as inferring sequences of unrelated trees [cites]. In this section we visually compare the output of recent ARG inference methods on the Kreitman dataset, a standard benchmark in the classical ARG literature, to illustrate the qualitative properties of these ARGs.

See #38 for some links on the mischaracterisation.

jeromekelleher commented 11 months ago

@a-ignatieva can you think of any papers where people have mischaracterised Relate output in particular?

a-ignatieva commented 11 months ago

I think this is a bit tricky - a lot of papers (including those listed in #38) say ambiguous things like "infer a series of local trees" or "infer only the local trees", but this isn't wrong necessarily depending on interpretation? I don't think it necessarily implies "trees inferred independently in sections of the genome". I mean Hubisz Siepel 2020 say "The ARG, therefore, can be thought of as being interchangeable with a sequence of local trees and the associated recombination events that transform each tree to the next.", and I suspect when most people say Relate and tsinfer don't infer ARGs, it's because they output the "sequence of local trees" but not the "associated recombination events" (rather than meaning the local trees are "unrelated"). Of course this is all sort of the point of the paper, but I just wonder if worth spelling this out explicitly when referring to the other work (rather than just citing as examples of "incorrect").

a-ignatieva commented 11 months ago

I can have a go at writing this in a para and we can see if we all agree on the message?

jeromekelleher commented 11 months ago

Sure, go ahead. I don't really mind what the message is once it's reasonably easy to follow and gives us a reason to talk about the qualitative differences between the ARGs from the four methods.

I think we do need to point out that people are misrepresenting stuff, or at least that they are confused/being confusing.

hyanwong commented 10 months ago

Fixed in #398