tskit-dev / what-is-an-arg-paper

Manuscript and code for the "What is an ARG?" paper
1 stars 8 forks source link

Add a tabular representation of a gARG somewhere #348

Closed hyanwong closed 10 months ago

hyanwong commented 11 months ago

In Fig 2 we have a tabular version of an eARG, but nowhere do we show a tabular representation of a gARG.

t struck us as a weird to show the “proper” tabular encoding for an eARG but not for a gARG when we read it through in Sedburgh. Like we were showing how to do the wrong thing in more detail than we were showing how to do the right thing

Gregor also makes this point:

since this paper advertises gARG as a data format, this actually becomes an important point or make a clear reference to data format specification!

gregorgorjanc commented 11 months ago

I looked into this and came up with the two tables below. I would appreciate someone double checking I did not make a typo. It was pretty straightforward though, since I only really had to list the nodes a-p and then read of edges from figure 1B. Maybe that's why @jeromekelleher isn't fussed about writing down these tables!

Doing this, questions arose! Of course;)

1) Where do we put these tables? Into \section*{Supplementary figures} and rename this section to \section*{Supplementary tables and figures} or just \section*{Supplementary}?

2) Do we show just the edge table or also the node table? In the definition of gARG, we say that we will not be using times.

3) If we do show the node table, do we list all the nodes in the pedigree in figure 1 or just those nodes that are accessible from sample genomes a-d? The figure 1A and 1B show all nodes.

\begin{tabular}{c|c}
Node & Time
\hline
$\noderef{a}$ & 0\\
$\noderef{b}$ & 0\\
$\noderef{c}$ & 0\\
$\noderef{d}$ & 0\\
$\noderef{e}$ & 1\\
$\noderef{f}$ & 1\\
$\noderef{g}$ & 1\\
$\noderef{i}$ & 1\\
$\noderef{j}$ & 2 - remove this row?\\
$\noderef{k}$ & 2\\
$\noderef{l}$ & 2 - remove this row?\\
$\noderef{m}$ & 3 - remove this row?\\
$\noderef{n}$ & 3\\
$\noderef{o}$ & 3 - remove this row?\\
$\noderef{p}$ & 3 - remove this row?\\
\end{tabular}

\begin{tabular}{c|c|c}
Child & Parent & Interval\\
\hline
$\noderef{a}$ & $\noderef{e}$ & $[0,2)$\\
$\noderef{a}$ & $\noderef{f}$ & $[2,10)$\\
$\noderef{b}$ & $\noderef{g}$ & $[0,10)$\\
$\noderef{c}$ & $\noderef{f}$ & $[0,7)$\\
$\noderef{c}$ & $\noderef{e}$ & $[7,10)$\\
$\noderef{d}$ & $\noderef{h}$ & $[0,10)$\\
$\noderef{e}$ & $\noderef{i}$ & $[0,10)$\\
$\noderef{f}$ & $\noderef{k}$ & $[0,10)$\\
$\noderef{g}$ & $\noderef{i}$ & $[0,10)$\\
$\noderef{h}$ & $\noderef{k}$ & $[0,10)$\\
$\noderef{i}$ & $\noderef{n}$ & $[0,10)$\\
$\noderef{k}$ & $\noderef{n}$ & $[0,10)$\\
\end{tabular}
jeromekelleher commented 11 months ago

LGTM. The main point for me would be to change the "Interval" column to "Intervals", and each row to being a set, like

$\noderef{a}$ & $\noderef{e}$ & $\{[0,2)\}$\\

It's an important detail that we can have a set of disjoint intervals on a gARG edge (contrasted with the denormalised tskit "edge-interval")

jeromekelleher commented 11 months ago

Where do we put these tables? Into \section{Supplementary figures} and rename this section to \section{Supplementary tables and figures} or just \section*{Supplementary}?

Yeah just change to "Supplementary" or something.

hyanwong commented 11 months ago

There might (just) be enough room to squeeze the table as a figure 1D (to match fig 2C), but it would be very tight. I don't know if this is worth the effort to get working? I assume not, although I would be willing to have a go.

jeromekelleher commented 11 months ago

Alternatively, could put as an additional row, rather than another column?

I wouldn't put much effort in though, personally

gregorgorjanc commented 11 months ago

What about these two questions @jeromekelleher @hyanwong?

  1. Do we show just the edge table or also the node table? In the definition of gARG, we say that we will not be using times. If I remove time from the node table, then we only end up with one column.

  2. If we do show the node table, do we list all the nodes in the pedigree in figure 1 or just those nodes that are accessible from sample genomes a-d? The figure 1A and 1B show all nodes.

gregorgorjanc commented 11 months ago

There might (just) be enough room to squeeze the table as a figure 1D (to match fig 2C), but it would be very tight. I don't know if this is worth the effort to get working? I assume not, although I would be willing to have a go.

Yeah, 1x4 layout would be way too much for the Figure 1. 2x2 could work.

I will momentarily open a PR and stick the updated tables to Supplementary for now.

jeromekelleher commented 11 months ago

Here's what we have now:

Screenshot from 2023-10-16 10-22-46

I worry that this is actually more confusing than helpful, and trying to be concrete here about the data encoding is contrary to the (important) usage of letters to refer to nodes rather than integer IDs. Having a "node table" that includes a "node" column to me is quite confusing. If we really do want to provide a concrete encoding, it should probably be in terms of IDs, so that people aren't confused. But then we'd have to explain that a=0, b=1 etc and sort of flip back and forth between then.

Given that it's pretty obvious and simple anyway, maybe not worth the bother?

Or, alternatively, we could show the graph here in the supplement with integer nodes (with an extra "label" column?) and then have a proper data encoding next to it? I would like to be able to show a concrete example of an oriented tree somewhere, which is really hard when all of our nodes have letter labels, so maybe this is worth it?

hyanwong commented 11 months ago

Do we need time at all? Why not just have the edge table?

By the way, we also have a supplementary figure S1: could we take one of those ARGs and show the edge table for that. We could convert the node labels in there to numbers easily enough, and I don't think it would be too confusing for those coming from Fig 4.

jeromekelleher commented 11 months ago

Do we need time at all? Why not just have the edge table?

Not really, the time isn't necessary for this paper (DAG is sufficient).

Then though, it's even more trivial and literally just listing the graph edges and the things we've annotated them with, so doesn't seem worth the bother.

By the way, we also have a supplementary figure S1: could we take one of those ARGs and show the edge table for that. We could convert the node labels in there to numbers easily enough, and I don't think it would be too confusing for those coming from Fig 4.

True. Becomes a bit more difficult to forward ref to, and, if not basing on initial example figure, we may as well use a smaller example (which illustrates multiple intervals on an edge)?

I worry that we're getting bogged down on details here that the vast majority of readers aren't going to care about. I mean, we have loads of examples out there, we just need to get the point across that it's the same as tskit and then the job is done?

gregorgorjanc commented 11 months ago

I agree the tables are underwhelming! I followed this through because of the opened issue. I am happy to remove these tables.

True. Becomes a bit more difficult to forward ref to, and, if not basing on initial example figure, we may as well use a smaller example (which illustrates multiple intervals on an edge)?

Maybe we change the "current table text" to something like: "We will show an example of data encoding in a table for one of the later examples", or something like this? If we do this, for which example should I build the tables (or just the edge table)?

I worry that we're getting bogged down on details here that the vast majority of readers aren't going to care about. I mean, we have loads of examples out there, we just need to get the point across that it's the same as tskit and then the job is done?

Happy to remove the tables. In my view, they could strengthen the argument for the gARG though by showing concrete encoding - images are beautiful and quite easy to follow, but a table is an exact thing. I agree it's a bit redundant, so I am not to fussed if we remove the tables.

jeromekelleher commented 11 months ago

OK, let's revisit when more urgent problems have been tackled.

jeromekelleher commented 10 months ago

Closing this off as we've removed both this tabular representation and the eARG one.