Closed hyanwong closed 10 months ago
I looked into this and came up with the two tables below. I would appreciate someone double checking I did not make a typo. It was pretty straightforward though, since I only really had to list the nodes a-p
and then read of edges from figure 1B. Maybe that's why @jeromekelleher isn't fussed about writing down these tables!
Doing this, questions arose! Of course;)
1) Where do we put these tables? Into \section*{Supplementary figures}
and rename this section to \section*{Supplementary tables and figures}
or just \section*{Supplementary}
?
2) Do we show just the edge table or also the node table? In the definition of gARG, we say that we will not be using times.
3) If we do show the node table, do we list all the nodes in the pedigree in figure 1 or just those nodes that are accessible from sample genomes a-d? The figure 1A and 1B show all nodes.
\begin{tabular}{c|c}
Node & Time
\hline
$\noderef{a}$ & 0\\
$\noderef{b}$ & 0\\
$\noderef{c}$ & 0\\
$\noderef{d}$ & 0\\
$\noderef{e}$ & 1\\
$\noderef{f}$ & 1\\
$\noderef{g}$ & 1\\
$\noderef{i}$ & 1\\
$\noderef{j}$ & 2 - remove this row?\\
$\noderef{k}$ & 2\\
$\noderef{l}$ & 2 - remove this row?\\
$\noderef{m}$ & 3 - remove this row?\\
$\noderef{n}$ & 3\\
$\noderef{o}$ & 3 - remove this row?\\
$\noderef{p}$ & 3 - remove this row?\\
\end{tabular}
\begin{tabular}{c|c|c}
Child & Parent & Interval\\
\hline
$\noderef{a}$ & $\noderef{e}$ & $[0,2)$\\
$\noderef{a}$ & $\noderef{f}$ & $[2,10)$\\
$\noderef{b}$ & $\noderef{g}$ & $[0,10)$\\
$\noderef{c}$ & $\noderef{f}$ & $[0,7)$\\
$\noderef{c}$ & $\noderef{e}$ & $[7,10)$\\
$\noderef{d}$ & $\noderef{h}$ & $[0,10)$\\
$\noderef{e}$ & $\noderef{i}$ & $[0,10)$\\
$\noderef{f}$ & $\noderef{k}$ & $[0,10)$\\
$\noderef{g}$ & $\noderef{i}$ & $[0,10)$\\
$\noderef{h}$ & $\noderef{k}$ & $[0,10)$\\
$\noderef{i}$ & $\noderef{n}$ & $[0,10)$\\
$\noderef{k}$ & $\noderef{n}$ & $[0,10)$\\
\end{tabular}
LGTM. The main point for me would be to change the "Interval" column to "Intervals", and each row to being a set, like
$\noderef{a}$ & $\noderef{e}$ & $\{[0,2)\}$\\
It's an important detail that we can have a set of disjoint intervals on a gARG edge (contrasted with the denormalised tskit "edge-interval")
Where do we put these tables? Into \section{Supplementary figures} and rename this section to \section{Supplementary tables and figures} or just \section*{Supplementary}?
Yeah just change to "Supplementary" or something.
There might (just) be enough room to squeeze the table as a figure 1D (to match fig 2C), but it would be very tight. I don't know if this is worth the effort to get working? I assume not, although I would be willing to have a go.
Alternatively, could put as an additional row, rather than another column?
I wouldn't put much effort in though, personally
What about these two questions @jeromekelleher @hyanwong?
Do we show just the edge table or also the node table? In the definition of gARG, we say that we will not be using times. If I remove time from the node table, then we only end up with one column.
If we do show the node table, do we list all the nodes in the pedigree in figure 1 or just those nodes that are accessible from sample genomes a-d? The figure 1A and 1B show all nodes.
There might (just) be enough room to squeeze the table as a figure 1D (to match fig 2C), but it would be very tight. I don't know if this is worth the effort to get working? I assume not, although I would be willing to have a go.
Yeah, 1x4 layout would be way too much for the Figure 1. 2x2 could work.
I will momentarily open a PR and stick the updated tables to Supplementary for now.
Here's what we have now:
I worry that this is actually more confusing than helpful, and trying to be concrete here about the data encoding is contrary to the (important) usage of letters to refer to nodes rather than integer IDs. Having a "node table" that includes a "node" column to me is quite confusing. If we really do want to provide a concrete encoding, it should probably be in terms of IDs, so that people aren't confused. But then we'd have to explain that a=0, b=1 etc and sort of flip back and forth between then.
Given that it's pretty obvious and simple anyway, maybe not worth the bother?
Or, alternatively, we could show the graph here in the supplement with integer nodes (with an extra "label" column?) and then have a proper data encoding next to it? I would like to be able to show a concrete example of an oriented tree somewhere, which is really hard when all of our nodes have letter labels, so maybe this is worth it?
Do we need time at all? Why not just have the edge table?
By the way, we also have a supplementary figure S1: could we take one of those ARGs and show the edge table for that. We could convert the node labels in there to numbers easily enough, and I don't think it would be too confusing for those coming from Fig 4.
Do we need time at all? Why not just have the edge table?
Not really, the time isn't necessary for this paper (DAG is sufficient).
Then though, it's even more trivial and literally just listing the graph edges and the things we've annotated them with, so doesn't seem worth the bother.
By the way, we also have a supplementary figure S1: could we take one of those ARGs and show the edge table for that. We could convert the node labels in there to numbers easily enough, and I don't think it would be too confusing for those coming from Fig 4.
True. Becomes a bit more difficult to forward ref to, and, if not basing on initial example figure, we may as well use a smaller example (which illustrates multiple intervals on an edge)?
I worry that we're getting bogged down on details here that the vast majority of readers aren't going to care about. I mean, we have loads of examples out there, we just need to get the point across that it's the same as tskit and then the job is done?
I agree the tables are underwhelming! I followed this through because of the opened issue. I am happy to remove these tables.
True. Becomes a bit more difficult to forward ref to, and, if not basing on initial example figure, we may as well use a smaller example (which illustrates multiple intervals on an edge)?
Maybe we change the "current table text" to something like: "We will show an example of data encoding in a table for one of the later examples", or something like this? If we do this, for which example should I build the tables (or just the edge table)?
I worry that we're getting bogged down on details here that the vast majority of readers aren't going to care about. I mean, we have loads of examples out there, we just need to get the point across that it's the same as tskit and then the job is done?
Happy to remove the tables. In my view, they could strengthen the argument for the gARG though by showing concrete encoding - images are beautiful and quite easy to follow, but a table is an exact thing. I agree it's a bit redundant, so I am not to fussed if we remove the tables.
OK, let's revisit when more urgent problems have been tackled.
Closing this off as we've removed both this tabular representation and the eARG one.
In Fig 2 we have a tabular version of an eARG, but nowhere do we show a tabular representation of a gARG.
Gregor also makes this point: