Towards possible M2TML implem

smarie commented 4 years ago

Hi metric-learn team ! We have discussed recently with tslearn on the possibility of implementing M2TML metric learning, which generalizes the "large margin" concepts in LMNN (Weinberger and Saul) in order to learn a sparse or not, linear or not, combination of basic metrics. The thesis and associated papers (such as this one) present results for timeseries classification so the set of basic metrics is oriented towards timeseries ; however the method is generic and therefore would not really make sense in tslearn - it seems that it would rather belong in here.

I would therefore be interested to follow your developments in order to determine when would be a good time to propose a PR. I have no bandwidth right now but things evolve and experience shows that such PRs need to be prepared a little (in scikit learn, PRs take months/years to be merged :) )

From what I saw in your LMNN implementation you seem to reimplement the optimization solvers yourselves in plain python for now. Is there some plan to rely on more "robust"/"fast" solvers such as ipopt or simply scipy (but this discussion makes me fear that the implementation is not very efficient) ?

Also, in our work we rely on a "pairwise" representation of data, where each sample in that space is a pair in the original space. Is there such a concept already in metric-learn ? That would certainly ease any implementation.

Finally, is there a plan to have a section in metric-learn to expose basic metrics, whether they are dissimilarities (euclidean, manhattan...) or similarities (corT, kernels ...)

Thanks very much for all your answers ! And there is absolutely no hurry again, as I said above this is more to "prepare the ground"

bellet commented 4 years ago

Hi @smarie, thank you for your interest!

There are quite a few questions in your post so I'll try to reply the best I can to each of them.

Hi metric-learn team ! We have discussed recently with tslearn on the possibility of implementing M2TML metric learning, which generalizes the "large margin" concepts in LMNN (Weinberger and Saul) in order to learn a sparse or not, linear or not, combination of basic metrics. The thesis and associated papers (such as this one) present results for timeseries classification so the set of basic metrics is oriented towards timeseries ; however the method is generic and therefore would not really make sense in tslearn - it seems that it would rather belong in here.

I would therefore be interested to follow your developments in order to determine when would be a good time to propose a PR. I have no bandwidth right now but things evolve and experience shows that such PRs need to be prepared a little (in scikit learn, PRs take months/years to be merged :) )

I am not very familiar with M2TML but after a quick look it seems to be learning a metric in the form of a combination of a set of pre-defined metrics. Implementing a general metric learning algorithm in this context (where the pre-defined metrics can be specified by the user, e.g. for time-series or something else) would indeed be interesting for metric-learn as such a formulation can cover many use-cases.

(Note that PR #278 (currently in progress) is implementing SCML, which learns a sparse linear combination of rank-one Mahalanobis metrics, so in this sense it is a bit related, although limited to Mahalanobis distances.)

From what I saw in your LMNN implementation you seem to reimplement the optimization solvers yourselves in plain python for now. Is there some plan to rely on more "robust"/"fast" solvers such as ipopt or simply scipy (but this discussion makes me fear that the implementation is not very efficient) ?

We are definitely interested in making our implementations more scalable. Using off-the-shelf solvers can be an option. For the particular case of LMNN, we would like to refactor to code anyway as it is currently very hard to understand (see #210). A potential alternative could be to just code functions that compute the objective and the gradient and solve the problem using for instance scipy's L-BFGS, as done in a open PR to include LMNN in sklearn (which is currently stuck): https://github.com/scikit-learn/scikit-learn/pull/8602 Benchmarking this code against our current implementation (and replacing it if it is clearly better) would be interesting.

Also, in our work we rely on a "pairwise" representation of data, where each sample in that space is a pair in the original space. Is there such a concept already in metric-learn ? That would certainly ease any implementation.

I guess here you mean to represent pairs of points as a vector of their distances according to the different pre-defined metrics. We do not currently have this, but this can be done easily inside the fit, unless there is a strong drawback to do things in this way?

Finally, is there a plan to have a section in metric-learn to expose basic metrics, whether they are dissimilarities (euclidean, manhattan...) or similarities (corT, kernels ...)

Not really at this point. Many of the classic metrics are exposed in scipy and sklearn already. In metric-learn, we expose the (learned) metric function through the get_metric() method of the metric learners. Maybe you could give more details on what you have in mind?

smarie commented 4 years ago

Thanks @bellet for the clear and detailed answer.

I guess here you mean to represent pairs of points as a vector of their distances according to the different pre-defined metrics. We do not currently have this, but this can be done easily inside the fit, unless there is a strong drawback to do things in this way?

If I remember well from our MATLAB implementation, you might wish to pre-compute metrics for efficiency, especially when DTW comes into play. Then when this goes into a cross-validation loop, the various calls to fit should be able to rely on this pre-computed data representation so as not to recompute the distances. And cross-validation itself needs special care, especially concerning pairs made from one sample in train + one sample in test.

But this is maybe a specificity from our method - I was just curious to see if others came to the same conclusion and therefore would have led metric-learn to implement special data structures and cross-validation operators to handle this.

Not really at this point. Many of the classic metrics are exposed in scipy and sklearn already. In metric-learn, we expose the (learned) metric function through the get_metric() method of the metric learners. Maybe you could give more details on what you have in mind?

Fait enough. I had in mind a single-stop shop to

get all basic metrics (not related to timeseries, these would go somewhere like tslearn ; but all others euclidean manhattan etc.)
have a data model of metrics, i.e. know which ones are similarities, which ones are dissimilarities, which ones are distances, etc.
get operators to transform a learnt metric into a kernel for usage in e.g. SVC. For example do you currently guarantee that get_metric returns a similarity, in all learners ? Having a to_kernel(metric) method able to detect if the provided metric is a dissimilarity or a similarity and applying an appropriate transform (customizable) would be great

I saw this in sklearn, it is also a place where such things could fit : https://scikit-learn.org/stable/modules/metrics.html

scikit-learn-contrib / metric-learn

Towards possible M2TML implem #284