Open matsen opened 8 years ago
Yeah, sounds good.
I suppose I won't use the ROOT implementation that I know and love/hate...
related: #171 and #176
Thinking about it more, logistic regression would probably be a better fit than RF. But perhaps we should just try both.
Vladimir points out that there are methods for doing feature selection and clustering at the same time.
Could we use machine learning to automatically generate heuristics for clustering? We have a "gold standard" now with full partis, and the goal of more approximate clustering should be to replicate that. Thus, how about having the computers do the work of figuring out how to do that best?
I'll bet that we could just throw in
into a random forest classifier and have it pop out a nice predictor of whether two sequences fit in the same clonal family that didn't require any expensive operations. Then we could use those classifiers for more clustering.
I specifically don't want this to be something that we run once and then get parameter values which are frozen for the rest of time. Rather, I hope it could be a partis command to generate these heuristics.