matsengrp / cft

Clonal family tree
5 stars 3 forks source link

Is there a way to toggle between different timepoints on the trees? #165

Closed lauradoepker closed 7 years ago

lauradoepker commented 7 years ago

@metasoarous

Hi there - Megan (hopefully for the last time) on Laura's account.

Thanks! Megan

metasoarous commented 7 years ago

Sigh...

Right as I was finishing up all the merged timepoints work I thought "Should I check to see if it would be useful to continue processing separate timepoints separately?" And after consideration said to myself "Nah! More trouble than it's worth. No one's going to want to see the timepoints separately." Should have checked...

There's a couple different things that could be done here:

  1. Just trim the trees to include the sequences from the given timepoint. In general folks (@matsen in particular) tend to prefer rebuilding a tree on the subset over pruning for reasons, so this isn't really top choice, but is easy.
  2. Actually retool the pipeline again to compute each timepoint separately again. This could maybe be a dataset parameter, so that you can flip between a merged dataset and datasets for separate timepoints.

2 is probably the way to go here.

metasoarous commented 7 years ago

@lauranoges @meganstumpf Does having separate datasets for merged vs 462dpi-alone timepoints seem like a fine solution? Then you could just select one or the other in the datasets dropdown at the top right, depending on what you want to look at.

There's some work to do here figuring out how we run which timepoints with respect to how datasets get set up. And I may need to chat with @psathyrella about this in relation to #167 and how we model all the relationships in the data.

lauradoepker commented 7 years ago

Yes @metasoarous that would be sufficient. Just to clarify, when you say "merged" do you always mean "all time points from a given patient in the same dataset"?

meganstumpf commented 7 years ago

I'm on board with that solution as well pending clarification of Laura's point. I am assuming the 462dpi alone set will bring back the hits for QA255.006VH? A major goal for me would be to have access to the data for QA255.006VH again.

metasoarous commented 7 years ago

Yes; that's what I mean by "merged" (see #10). This is what you presently see on the http://stoat:5555 deployment.

metasoarous commented 7 years ago

In meeting with @meganstumpf and @lauranoges Picks were made on Jan 4th on (probably) stoat:5000. Timepoint 462dpi (the older one) is the only one she cares about. May 19th deadline.

It's gonna be hard to get that specific dataset back, so I'm going to try rerunning with just that timepoint. Means adding this ability to the build pipeline.

meganstumpf commented 7 years ago

@metasoarous Just checking in - do we have an update on when the 462dpi-only dataset will be viewable?

lauradoepker commented 7 years ago

@metasoarous and I just chatted and we discussed a few issues about merged trees:

1) When downsampling, we're curious how few leaves we can get away with and still generate a tree that representative of the larger clonal family. I understand this is subjective, but is the 100 sequence cutoff good? Overly big? Too small?

2) With the downsampling target size (i.e. 100 sequences) in mind, we're curious if we should be selecting equal numbers of leaves from each timepoint so as not to bias trees toward/away from certain datasets (i.e. the early dataset gets almost completely pruned out while the later dataset takes over). Maybe this concern will evaporate when we institute the duplicity pie charts that represent the sizes of thinned clades? Otherwise, should we try picking 50(?) sequences from each timepoint?

metasoarous commented 7 years ago

Sounds like you're investigating 1 in another project.

As for 2, our duplicity pie charts now account for our minadcl-thinned trees, as per #158.

Finally, Duncan and I have set things up to run timepoints separately, and have done this for the requested timepoints, so I'm going to close this issue.