Closed lauradoepker closed 7 years ago
Sigh...
Right as I was finishing up all the merged timepoints work I thought "Should I check to see if it would be useful to continue processing separate timepoints separately?" And after consideration said to myself "Nah! More trouble than it's worth. No one's going to want to see the timepoints separately." Should have checked...
There's a couple different things that could be done here:
2 is probably the way to go here.
@lauranoges @meganstumpf Does having separate datasets for merged vs 462dpi-alone timepoints seem like a fine solution? Then you could just select one or the other in the datasets dropdown at the top right, depending on what you want to look at.
There's some work to do here figuring out how we run which timepoints with respect to how datasets get set up. And I may need to chat with @psathyrella about this in relation to #167 and how we model all the relationships in the data.
Yes @metasoarous that would be sufficient. Just to clarify, when you say "merged" do you always mean "all time points from a given patient in the same dataset"?
I'm on board with that solution as well pending clarification of Laura's point. I am assuming the 462dpi alone set will bring back the hits for QA255.006VH? A major goal for me would be to have access to the data for QA255.006VH again.
Yes; that's what I mean by "merged" (see #10). This is what you presently see on the http://stoat:5555 deployment.
In meeting with @meganstumpf and @lauranoges Picks were made on Jan 4th on (probably) stoat:5000. Timepoint 462dpi (the older one) is the only one she cares about. May 19th deadline.
It's gonna be hard to get that specific dataset back, so I'm going to try rerunning with just that timepoint. Means adding this ability to the build pipeline.
@metasoarous Just checking in - do we have an update on when the 462dpi-only dataset will be viewable?
@metasoarous and I just chatted and we discussed a few issues about merged trees:
1) When downsampling, we're curious how few leaves we can get away with and still generate a tree that representative of the larger clonal family. I understand this is subjective, but is the 100 sequence cutoff good? Overly big? Too small?
2) With the downsampling target size (i.e. 100 sequences) in mind, we're curious if we should be selecting equal numbers of leaves from each timepoint so as not to bias trees toward/away from certain datasets (i.e. the early dataset gets almost completely pruned out while the later dataset takes over). Maybe this concern will evaporate when we institute the duplicity pie charts that represent the sizes of thinned clades? Otherwise, should we try picking 50(?) sequences from each timepoint?
Sounds like you're investigating 1 in another project.
As for 2, our duplicity pie charts now account for our minadcl-thinned trees, as per #158.
Finally, Duncan and I have set things up to run timepoints separately, and have done this for the requested timepoints, so I'm going to close this issue.
@metasoarous
Hi there - Megan (hopefully for the last time) on Laura's account.
Thanks! Megan