matsengrp / cft

Clonal family tree
5 stars 3 forks source link

Refinement of minadcl? #182

Closed metasoarous closed 5 years ago

metasoarous commented 7 years ago

It occurred to me that if we believe that sequences with higher duplicity are less likely to be the result of sequencing error, then perhaps we should take this into account as we select representatives.

@matsen How crazy do you think it would be to adapt min_adcl to accept weights? Based on my intuitive understanding of what min_adcl is doing, it seems like that could be translated into a well defined objective function. But I'm not sure how easy it would be to adapt the current implementation to account for that information. Tell me if this is just way out of scope :-)

On the other hand, I imagine that if min_adcl is doing what we expect it to, it might actually be doing a decent job of picking out sequences of higher duplicity. I'd expect such sequences to tend to be in the "center" of sequencing error point mutations, as well as genuine biological sequences of lower fitness branching off these centers.

Now that I'm working on figuring out the clusters for min_adcl for the purpose of aggregating duplicity, we could actually start looking at whether we tend to pick up higher duplicity sequences. I'd imagine that demonstrating such a tendancy might boost confidence in our methods.

I recognize that this isn't super high in priority right now, but I wanted to jot down the thought while it was on my mind.

matsen commented 7 years ago

I had the same thought. Like you, I don't think it's ultra-high priority.

One challenge is how we translate sequence quality into a weight that is somehow comparable to tree distance, which is what ADCL is actually about.

metasoarous commented 7 years ago

But doesn't the solution come down to thinking about the flow of mass on the tree? If so, it would seem that taking the duplicity into account in defining the mass of each tree leaf (rather than all leafs implicitly having mass 1) would get you there.

matsen commented 7 years ago

For those following along from home, ADCL is described in https://paperpile.com/shared/tS1gHB .

matsen commented 5 years ago

Is this still interesting? My sense is that minadcl isn't really used for much other than display, as the ancestral sequence lineage information is the most important, and that doesn't use adcl atall.

lauradoepker commented 5 years ago

Closing because Erick's comment is correct in that we only use this for unseeded partis visualization

metasoarous commented 5 years ago

This seemed interesting to theoretically when we started thinking about it, but we agreed that it wasn't really a pressing issue and so it's just been sitting here. Happy to leave closed to clear up the board space.