Closed jeromekelleher closed 1 year ago
Caption wants a few small tweaks also, but might as well wait until final version of the fig is in place.
FWIW I agree with all of these, apart from
- Change dashed arrows in B to solid (makes them seem like a different "type" of edge)
This doesn't bother me, but I can see the logic. It is helpful, however, to highlight the edges that differ between the two trees in the bottom part of B. Maybe they could be distinguished by shading of colour (although that might still give the impression of a "different sort" of edge. We could justify it by labelling (in the legend) partial-edges as a different colour?
- Don't number the root of the trees in C, D and E as zero (this is misleading, I think). Maybe just drop the node numbers entirely for C and D, as we don't refer to them and it's probably clear enough?
We would need to change the legend, then? Perhaps we need a different symbol for a just-inserted node? We want to make it obvious in C, D, and E which are the nodes we have just inserted. Removing the 0 means that you can't see that so clearly.
- Change the direction of the "copying" arrows. Sequences copy older sequences, in the algorithm (let's not get into the semantics of this)
I think having arrows pointing up (representing the algorithm) risks getting quite confused with the direction of time. Should we perhaps omit the arrows entirely? Also see my comment below about "copying":
Re: the caption, it currently says:
Sc2ts reconstructs the genetic relationships among SARS- CoV-2 genomes by copying samples to all possible ancestors collected at earlier time points (curved arrows)
"copying to ancestors" sounds weird to me (and I think would confuse non LS experts). Biologically, we copy sequences from ancestors. Could we use "matching" language instead:
Sc2ts reconstructs the genetic relationships among SARS- CoV-2 genomes by matching samples against all possible ancestors collected at earlier time points (curved arrows)
This doesn't bother me, but I can see the logic. It is helpful, however, to highlight the edges that differ between the two trees in the bottom part of B. Maybe they could be distinguished by shading of colour (although that might still give the impression of a "different sort" of edge. We could justify it by labelling (in the legend) partial-edges as a different colour?
I agree - this is what I was suggesting in 6 I think
We would need to change the legend, then? Perhaps we need a different symbol for a just-inserted node? We want to make it obvious in C, D, and E which are the nodes we have just inserted. Removing the 0 means that you can't see that so clearly.
Maybe change the "leaf" nodes to a, b, c, d then or something? I'm worried that people will see these as being the same nodes as we're showing in the "global" ARG above and get confused. Particularly for 0 - I don't want people to think that these trees are always rooted at the reference.
I think having arrows pointing up (representing the algorithm) risks getting quite confused with the direction of time. Should we perhaps omit the arrows entirely? Also see my comment below about "copying":
We can omit at the arrows.
I think the caption needs substantial revision - I agree we shouldn't use the term "copying" here
Maybe change the "leaf" nodes to a, b, c, d then or something? I'm worried that people will see these as being the same nodes as we're showing in the "global" ARG above and get confused. Particularly for 0 - I don't want people to think that these trees are always rooted at the reference.
Yeah, great point. Letters, or if we want to keep the idea of numerical indexes and there's enough space for 2 digits, for C, D, E we could use 10, 11, 12, 13, 14, ...?
Sure, I don't mind once it's clear these are different nodes to the global ARG immediately above
- Change the direction of the "copying" arrows. Sequences copy older sequences, in the algorithm (let's not get into the semantics of this)
Removed all the arrowheads.
- Use ISO 8601 in the left col for dates (2020-01)
Reformatted.
- Change "recombinant node" to "recombination node"
Done.
- Change dashed arrows in B to solid (makes them seem like a different "type" of edge)
They are solid now.
- We're missing the recombination node in the local trees
Added them.
- Maybe use different colours for the two recombinant parents of 4 instead, which could be mirrored in the two local trees below?
Hmm, I'm associating dashed lines with recombinant ancestry. If we are colouring them, should we colour the entire parental genomes, say some shades of blue and purple?
- Don't number the root of the trees in C, D and E as zero (this is misleading, I think). Maybe just drop the node numbers entirely for C and D, as we don't refer to them and it's probably clear enough?
I have followed Yan's suggestion to increment the sample node numbers. This is so that I don't have to change the legend on the side, which indicates that the symbol for a newly added node. In panel C, the existing node 5 in the global ARG serves as the attachment node for new daily samples numbered 7 to 10. In panel D, new daily samples numbered 11 and 12 are attached to the existing node 2 in the global ARG before mutation collapsing. Also, in panel E, it is now shown that a new daily sample numbered 13 is attached to the existing node 1 in the global ARG before reversion pushing.
- Change caption for C to "Tree inference for daily sample cluster"
Done.
Please see working draft below.
In the legend in the figure, there is a symbol for the "copying" paths. What should we call them instead?
Looks good, thanks @shan!
On second thoughts, "copying path" is good I think, we need to refer to this in some way. We can clarify in the caption and point people to the LS section.
I do think have arrows pointing up would help here in the reference panel. There's surely no confusion about the direction of time, given the trees right next to it, and the great big arrow pointing down showing the direction of time? Younger samples copy from older samples, that's an essential idea to get across.
I'm note sure the numbers in C, D and E help here, I think it'll be more confusing and people will think that these are specific nodes referring to something. We're not actually referring to them anywhere, so they don't need to be labelled.
What if we changed the colour of non-sample nodes to red or blue (or something), and then distinguished new-ly added samples by the heavy outer ring? So in the legend we just have three circles with slightly different colours, and no numbers in the middle?
Younger samples copy from older samples, that's an essential idea to get across.
Wouldn't most people think that a "copying" arrow should go in the direction of copying (i.e. from original to copy)?
OK, let's just leave them out then
I'm note sure the numbers in C, D and E help here, I think it'll be more confusing and people will think that these are specific nodes referring to something. We're not actually referring to them anywhere, so they don't need to be labelled.
I was thinking about showing how new samples could be added to the global ARG in panel B in new daily batches. But I suppose that it can be confusing and it adds quite a bit of text in the figure legend to explain it all.
Yeah, I think there's enough in there at the moment. We explain the daily batches thing quite a lot in the text, and that's hopefully fairly clear
I'm thinking about whether it is a good idea to mention Wuhan-Hu-1 here. It makes the schematic a bit more concrete, but in principle, the root node could be another sample genome or even a reconstructed ancestral genome (say, built using some existing method using early SARS-CoV-2 genomes).
I like the concreteness, lets keep it.
I've taken your suggestion using three types of node symbols. Indicating non-sample nodes using blue (aero, #7CB9E8) works pretty nicely, I think.
Also, I've coloured the parental genomes using red (vermillion, #E34234) and purple (amethyst, #9966CC). I don't think we need the dashed edges showing recombinant ancestry, so I've removed them.
Now, I wonder if it is confusing that the parent nodes are not coloured the same way as their corresponding genomes.
Two minor edits:
I like the blue for non-sample nodes in C, D, E. It works well for highlighting. But shouldn't the node on the LHS of E be grey?
I like the blue for non-sample nodes in C, D, E. It works well for highlighting. But shouldn't the node on the LHS of E be grey?
Oops, good catch.
I'm not sure whether the parent genomes should be coloured like they are now.
I like it, it shows that 4 is a mosaic of 1 and 2.
Can we move the non-arrow from 4->2 slightly leftwards so it's in the middle of the red bit. Then maybe bend the non-arrow from 4->1 in the other direction?
The easiest thing re colours is to choose them from the standard matplotlib pallete
Okay, I've adjusted the copying paths. I've updated the colours according to the matplotlib palette, but I prefer the light grey (#CACACA) over the darker grey in the palette.
LGTM! Can you open a PR with the changes to the PDF please? What program are you using for this? If it's not totally massive can you include the source file also please?
I just noticed: the "local trees" are not trees?? I think you need to remove a different branch from each, right?
I use Affinity Designer, which uses a custom format. Probably it is easier to just PR the PDF?
I just noticed: the "local trees" are not trees?? I think you need to remove a different branch from each, right?
Crap, I had it correct before in older versions... I did not remove it when adding a recombination node in the local trees.
This version is good enough for this preprint then?
Hold on. I want to adjust the copying paths a bit.
In panel B, the inner copying paths now mirror the edges in the ARG on the right.
I didn't get around to updating the caption, will keep this open for now.
Closing this, caption updated
A few small changes would improve the method overview figure I think:
Does this sound ok @szhan?