Tweaks for method overview figure

jeromekelleher commented 1 year ago

A few small changes would improve the method overview figure I think:

Change the direction of the "copying" arrows. Sequences copy older sequences, in the algorithm (let's not get into the semantics of this)
Use ISO 8601 in the left col for dates (2020-01)
Change "recombinant node" to "recombination node"
Change dashed arrows in B to solid (makes them seem like a different "type" of edge)
We're missing the recombination node in the local trees
Maybe use different colours for the two recombinant parents of 4 instead, which could be mirrored in the two local trees below?
Don't number the root of the trees in C, D and E as zero (this is misleading, I think). Maybe just drop the node numbers entirely for C and D, as we don't refer to them and it's probably clear enough?
Change caption for C to "Tree inference for daily sample cluster"

Does this sound ok @szhan?

jeromekelleher commented 1 year ago

Caption wants a few small tweaks also, but might as well wait until final version of the fig is in place.

hyanwong commented 1 year ago

FWIW I agree with all of these, apart from

Change dashed arrows in B to solid (makes them seem like a different "type" of edge)

This doesn't bother me, but I can see the logic. It is helpful, however, to highlight the edges that differ between the two trees in the bottom part of B. Maybe they could be distinguished by shading of colour (although that might still give the impression of a "different sort" of edge. We could justify it by labelling (in the legend) partial-edges as a different colour?

Don't number the root of the trees in C, D and E as zero (this is misleading, I think). Maybe just drop the node numbers entirely for C and D, as we don't refer to them and it's probably clear enough?

We would need to change the legend, then? Perhaps we need a different symbol for a just-inserted node? We want to make it obvious in C, D, and E which are the nodes we have just inserted. Removing the 0 means that you can't see that so clearly.

Change the direction of the "copying" arrows. Sequences copy older sequences, in the algorithm (let's not get into the semantics of this)

I think having arrows pointing up (representing the algorithm) risks getting quite confused with the direction of time. Should we perhaps omit the arrows entirely? Also see my comment below about "copying":

Re: the caption, it currently says:

Sc2ts reconstructs the genetic relationships among SARS- CoV-2 genomes by copying samples to all possible ancestors collected at earlier time points (curved arrows)

"copying to ancestors" sounds weird to me (and I think would confuse non LS experts). Biologically, we copy sequences from ancestors. Could we use "matching" language instead:

Sc2ts reconstructs the genetic relationships among SARS- CoV-2 genomes by matching samples against all possible ancestors collected at earlier time points (curved arrows)

jeromekelleher commented 1 year ago

This doesn't bother me, but I can see the logic. It is helpful, however, to highlight the edges that differ between the two trees in the bottom part of B. Maybe they could be distinguished by shading of colour (although that might still give the impression of a "different sort" of edge. We could justify it by labelling (in the legend) partial-edges as a different colour?

I agree - this is what I was suggesting in 6 I think

jeromekelleher commented 1 year ago

We would need to change the legend, then? Perhaps we need a different symbol for a just-inserted node? We want to make it obvious in C, D, and E which are the nodes we have just inserted. Removing the 0 means that you can't see that so clearly.

Maybe change the "leaf" nodes to a, b, c, d then or something? I'm worried that people will see these as being the same nodes as we're showing in the "global" ARG above and get confused. Particularly for 0 - I don't want people to think that these trees are always rooted at the reference.

jeromekelleher commented 1 year ago

I think having arrows pointing up (representing the algorithm) risks getting quite confused with the direction of time. Should we perhaps omit the arrows entirely? Also see my comment below about "copying":

We can omit at the arrows.

I think the caption needs substantial revision - I agree we shouldn't use the term "copying" here

hyanwong commented 1 year ago

Maybe change the "leaf" nodes to a, b, c, d then or something? I'm worried that people will see these as being the same nodes as we're showing in the "global" ARG above and get confused. Particularly for 0 - I don't want people to think that these trees are always rooted at the reference.

Yeah, great point. Letters, or if we want to keep the idea of numerical indexes and there's enough space for 2 digits, for C, D, E we could use 10, 11, 12, 13, 14, ...?

jeromekelleher commented 1 year ago

Sure, I don't mind once it's clear these are different nodes to the global ARG immediately above

szhan commented 1 year ago

Change the direction of the "copying" arrows. Sequences copy older sequences, in the algorithm (let's not get into the semantics of this)

Removed all the arrowheads.

Use ISO 8601 in the left col for dates (2020-01)

Reformatted.

Change "recombinant node" to "recombination node"

Done.

Change dashed arrows in B to solid (makes them seem like a different "type" of edge)

They are solid now.

We're missing the recombination node in the local trees

Added them.

Maybe use different colours for the two recombinant parents of 4 instead, which could be mirrored in the two local trees below?

Hmm, I'm associating dashed lines with recombinant ancestry. If we are colouring them, should we colour the entire parental genomes, say some shades of blue and purple?

Don't number the root of the trees in C, D and E as zero (this is misleading, I think). Maybe just drop the node numbers entirely for C and D, as we don't refer to them and it's probably clear enough?

I have followed Yan's suggestion to increment the sample node numbers. This is so that I don't have to change the legend on the side, which indicates that the symbol for a newly added node. In panel C, the existing node 5 in the global ARG serves as the attachment node for new daily samples numbered 7 to 10. In panel D, new daily samples numbered 11 and 12 are attached to the existing node 2 in the global ARG before mutation collapsing. Also, in panel E, it is now shown that a new daily sample numbered 13 is attached to the existing node 1 in the global ARG before reversion pushing.

Change caption for C to "Tree inference for daily sample cluster"

Done.

Please see working draft below.

szhan commented 1 year ago

In the legend in the figure, there is a symbol for the "copying" paths. What should we call them instead?

jeromekelleher commented 1 year ago

Looks good, thanks @shan!

On second thoughts, "copying path" is good I think, we need to refer to this in some way. We can clarify in the caption and point people to the LS section.

I do think have arrows pointing up would help here in the reference panel. There's surely no confusion about the direction of time, given the trees right next to it, and the great big arrow pointing down showing the direction of time? Younger samples copy from older samples, that's an essential idea to get across.

I'm note sure the numbers in C, D and E help here, I think it'll be more confusing and people will think that these are specific nodes referring to something. We're not actually referring to them anywhere, so they don't need to be labelled.

What if we changed the colour of non-sample nodes to red or blue (or something), and then distinguished new-ly added samples by the heavy outer ring? So in the legend we just have three circles with slightly different colours, and no numbers in the middle?

hyanwong commented 1 year ago

Younger samples copy from older samples, that's an essential idea to get across.

Wouldn't most people think that a "copying" arrow should go in the direction of copying (i.e. from original to copy)?

jeromekelleher commented 1 year ago

OK, let's just leave them out then

szhan commented 1 year ago

I'm note sure the numbers in C, D and E help here, I think it'll be more confusing and people will think that these are specific nodes referring to something. We're not actually referring to them anywhere, so they don't need to be labelled.

I was thinking about showing how new samples could be added to the global ARG in panel B in new daily batches. But I suppose that it can be confusing and it adds quite a bit of text in the figure legend to explain it all.

jeromekelleher commented 1 year ago

Yeah, I think there's enough in there at the moment. We explain the daily batches thing quite a lot in the text, and that's hopefully fairly clear

szhan commented 1 year ago

I'm thinking about whether it is a good idea to mention Wuhan-Hu-1 here. It makes the schematic a bit more concrete, but in principle, the root node could be another sample genome or even a reconstructed ancestral genome (say, built using some existing method using early SARS-CoV-2 genomes).

jeromekelleher commented 1 year ago

I like the concreteness, lets keep it.

szhan commented 1 year ago

I've taken your suggestion using three types of node symbols. Indicating non-sample nodes using blue (aero, #7CB9E8) works pretty nicely, I think.

Also, I've coloured the parental genomes using red (vermillion, #E34234) and purple (amethyst, #9966CC). I don't think we need the dashed edges showing recombinant ancestry, so I've removed them.

Now, I wonder if it is confusing that the parent nodes are not coloured the same way as their corresponding genomes.

Two minor edits:

The string specifying date format along the y-axis is capitalised.
The genome of the causal recombinant genome (Sample 4) is coloured by red and purple. The red part matching the left parent ends at the second variable site. This should reflect the traceback path obtained in a forward HMM run. Earlier, I had it the other way around.

hyanwong commented 1 year ago

I like the blue for non-sample nodes in C, D, E. It works well for highlighting. But shouldn't the node on the LHS of E be grey?

szhan commented 1 year ago

I like the blue for non-sample nodes in C, D, E. It works well for highlighting. But shouldn't the node on the LHS of E be grey?

Oops, good catch.

szhan commented 1 year ago

I'm not sure whether the parent genomes should be coloured like they are now.

jeromekelleher commented 1 year ago

I like it, it shows that 4 is a mosaic of 1 and 2.

Can we move the non-arrow from 4->2 slightly leftwards so it's in the middle of the red bit. Then maybe bend the non-arrow from 4->1 in the other direction?

The easiest thing re colours is to choose them from the standard matplotlib pallete

maplotbli

szhan commented 1 year ago

Okay, I've adjusted the copying paths. I've updated the colours according to the matplotlib palette, but I prefer the light grey (#CACACA) over the darker grey in the palette.

jeromekelleher commented 1 year ago

LGTM! Can you open a PR with the changes to the PDF please? What program are you using for this? If it's not totally massive can you include the source file also please?

hyanwong commented 1 year ago

I just noticed: the "local trees" are not trees?? I think you need to remove a different branch from each, right?

szhan commented 1 year ago

I use Affinity Designer, which uses a custom format. Probably it is easier to just PR the PDF?

szhan commented 1 year ago

I just noticed: the "local trees" are not trees?? I think you need to remove a different branch from each, right?

Crap, I had it correct before in older versions... I did not remove it when adding a recombination node in the local trees.

szhan commented 1 year ago

This version is good enough for this preprint then?

szhan commented 1 year ago

Hold on. I want to adjust the copying paths a bit.

szhan commented 1 year ago

In panel B, the inner copying paths now mirror the edges in the ARG on the right.

jeromekelleher commented 1 year ago

I didn't get around to updating the caption, will keep this open for now.

jeromekelleher commented 1 year ago

Closing this, caption updated

jeromekelleher / sc2ts-paper

Tweaks for method overview figure #131