Open maxentile opened 9 years ago
"Large-margin component analysis" -- combining dimensionality reduction and metric learning-- http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_791.pdf
Distance metric learning a comprehensive survey: http://vipl.ict.ac.cn/homepage/ACCV14Metric/ref/1_Distance%20metric%20learning%20a%20comprehensive%20survey_TR2006.pdf
Metric learning: A survey: http://web.cse.ohio-state.edu/~kulis/pubs/ftml_metric_learning.pdf
Nonlinear metric learning: http://www1.cse.wustl.edu/~kilian/papers/chilmnn.pdf
Other ideas for follow-on work:
Bayesian active distance metric learning: http://www.cs.cmu.edu/~liuy/uai2007_bayesian.pdf
Given two arbitrary points in configuration space, the "kinetic distance" between them can be approximately determined, but at some cost. We effectively want to learn a good kinetic distance metric over all of configuration space. Seems like it could be a good fit for this approach...
Metric learning for kernel regression: http://jmlr.org/proceedings/papers/v2/weinberger07a/weinberger07a.pdf e.g. regressing on time lag
Kernel regression with sparse metric learning: http://www.cst.ecnu.edu.cn/~slsun/pubs/KRSML.pdf
Large-margin multitask metric learning: http://papers.nips.cc/paper/3935-large-margin-multi-task-metric-learning.pdf
Idea: "Tree-preserving metric learning"
In LMNN, we consider triplets (i,j,k) of points where (i,j) belong to the same class and k belongs to a different class, and we want to learn a metric that places pairs (i,j) closer together than pairs (i,k), plus a constant margin. We could modify this approach to accept hierarchical cluster assignments instead of class labelings.
For LMNN, is there a way to supply an arbitrary loss function, rather than 0/1 loss? E.g. if we have 3 classes A,B,C, can I say that mistakenly putting A close to B is worse than putting A close to C? If so, that would be nice: "tree-preserving metric learning" would just be a special case where the loss function is "cophenetic distance" or something.
Tactic: one approach might be simply to replace the "1" in the SDP constraint with the desired margin for each triple: (ref: http://www.cse.wustl.edu/~kilian/papers/spm2010.pdf p. 4)
There would be several cool applications for solving this general problem. (1) Kinetic discriminatory metric learning: http://pubs.acs.org/doi/abs/10.1021/ct400132h - Given a time-ordered sequence of feature-vectors, we want to find a distance metric on the feature-vectors that preserves "kinetic distance" i.e. observations that are close in time should be close according to the learned metric. Previous approaches used a 0/1 loss at fixed lag times, but we could presumably do better if errors were weighted by the magnitude of the time lag. (2) "Tree-preserving metric learning" - Given a hierarchical cluster tree, find a distance metric such that points far apart in the tree are far apart in the metric.
Could apply this to learning the weight matrix M of what I'll call the "weighted Binet-Cauchy kernel"
The Binet-Cauchy kernel is:
from numpy import sqrt,eye
from numpy.linalg import det
def BC(X,Y):
return det(X.T.dot(Y)) / sqrt(det(X.T.dot(X)) * det(Y.T.dot(Y)))
where X and Y are matrices.
When X and Y are _n_x3 matrices containing residue coordinates, this computes something similar to RMSD, but has a couple advantages: it's faster, simpler, and it's definitely a kernel. (RMSD is probably also a kernel, but I'm not sure.)
We can weight pairs of residues using a matrix M:
def weighted_BC(X,Y,M=None):
if M==None:
M = eye(len(X))
return det(X.T.dot(M).dot(Y)) / sqrt(det(X.T.dot(M).dot(X)) * det(Y.T.dot(M).dot(Y)))
Question: what constraints must M satisfy for weighted_BC
to be a valid kernel?
Guess: as long as M is psd, it should be okay?
Neighborhood component analysis: http://papers.nips.cc/paper/2566-neighbourhood-components-analysis.pdf -- see section 5 for extensions to continuous labels (e.g. time-lags)
Kinetic discriminatory metric learning: http://pubs.acs.org/doi/abs/10.1021/ct400132h
Questions: