Open GCBallesteros opened 6 years ago
You are correct that multi-dimensional arrays are not supported for labels in the current implementation. Hopefully a future version could cope with this, although ultimately this would violate the sklearn API, so may need to be handled in a different way. In the meantime I can offer you a workaround. If you check issue #58 you can find some discussion of merging datasets that have different metrics. For your particular case I would suggest that this comment provides an outline for what you want to do -- just substitute the metric you want to use for the data, and the metric you want to use for the labels ('l2' in this case) for the 'bray-curtis'
and 'jaccard'
that was used in the example. You will likely want to play with the mix_ratio
to get a good balance between the data and the labels.
Thanks for the suggestion and the quick reply. I will try it at as soon as I can and come back with the result. Cheers.
I've been trying to do a fit for each of my targets and then intersect all of the results without much success. I just get a big blob that actually looks worst than the results I get when I just do a fit without passing the targets.
I was thinking on going into the code and modify umap_.fit
to change the call to sklearn.metrics.pairwise_distance
so that distances are computed as l2norm(t1-t2) were t1 and t2 are rows of my targets matrix using sklearn.metrics.pairwise.euclidean_distances
instead. Would this make sense?
fit1 = umap.UMAP(metric='l2').fit(X_umap ,y=np.squeeze(Y_umap[:, 0]))
fit2 = umap.UMAP(metric='l2').fit(X_umap ,y=np.squeeze(Y_umap[:, 1]))
fit3 = umap.UMAP(metric='l2').fit(X_umap ,y=np.squeeze(Y_umap[:, 2]))
# Intersect all graphs
intersection = umap.umap_.general_simplicial_set_intersection(fit1.graph_, fit2.graph_, weight=0.5)
intersection = umap.umap_.general_simplicial_set_intersection(intersection, fit3.graph_, weight=1/3.)
intersection = umap.umap_.reset_local_connectivity(intersection)
embedding = umap.umap_.simplicial_set_embedding(fit1._raw_data, intersection, fit1.n_components,
fit1.initial_alpha, fit1._a, fit1._b,
fit1.repulsion_strength, fit1.negative_sample_rate,
200, 'random', np.random, fit1.metric,
fit1._metric_kwds, False)
Ah, I see. I think you want something more like:
fit1 = umap.UMAP(metric='l2').fit(X_umap)
fit2 = umap.UMAP(metric='l2').fit(Y_umap)
intersection = umap.umap_.general_simplicial_set_intersection(fit1.graph_, fit2.graph_, weight=0.25)
intersection = umap.umap_.reset_local_connectivity(intersection)
embedding = umap.umap_.simplicial_set_embedding(fit1._raw_data, intersection, fit1.n_components,
fit1.initial_alpha, fit1._a, fit1._b,
fit1.repulsion_strength, fit1.negative_sample_rate,
200, 'random', np.random, fit1.metric,
fit1._metric_kwds, False)
where the weight is a little arbitrary (you may have to play with it a little). That may well be essentially what you were describing doing above.
That worked beautifully! Thanks!
One question remains. How can I test new unseen data points. I tried using fit1.transform(test_features)
because it was the only obvious thing to do but that didn't work. Any ideas?
Thanks again for the awesome code!
I think, unfortunately, that transforming new points through this custom pipeline is going to be non-trivial. It can be done, but I will have to work out exactly what incantations one would need to do so.
On Fri, Sep 28, 2018 at 3:39 AM GCBallesteros notifications@github.com wrote:
That worked beautifully! Thanks!
One question remains. How can I test new unseen data points. I tried using fit1.transform(test_features) because it was the only obvious thing to do but that didn't work. Any ideas?
Thanks again for the awesome code!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lmcinnes/umap/issues/145#issuecomment-425350160, or mute the thread https://github.com/notifications/unsubscribe-auth/ALaKBXRZPG0h1cCbEeTKXD75ohqBqworks5ufdIxgaJpZM4W4Zo7 .
Ah, I see. I think you want something more like:
fit1 = umap.UMAP(metric='l2').fit(X_umap) fit2 = umap.UMAP(metric='l2').fit(Y_umap) intersection = umap.umap_.general_simplicial_set_intersection(fit1.graph_, fit2.graph_, weight=0.25) intersection = umap.umap_.reset_local_connectivity(intersection) embedding = umap.umap_.simplicial_set_embedding(fit1._raw_data, intersection, fit1.n_components, fit1.initial_alpha, fit1._a, fit1._b, fit1.repulsion_strength, fit1.negative_sample_rate, 200, 'random', np.random, fit1.metric, fit1._metric_kwds, False)
where the weight is a little arbitrary (you may have to play with it a little). That may well be essentially what you were describing doing above.
This seems to no longer work as is and instead throws AttributeError: 'UMAP' object has no attribute 'initial_alpha'
.
Is there are way to get inital_alpha
from somewhere or should I set it arbitrarily? I couldn't find anything about the parameter in the documentation.
Additionally, simplicial_set_embedding()
seems to now require the parameters densmap
, densmap_kwds
and output_dens
even if densMAP
is not used?
Is the multi-label supervised/semi-supervised learning option available now?
I think the best bet right now is to intersect with the label data via the model composition (i.e. build a model on data, a different model on labels and use the *
operator on the models) -- see https://umap-learn.readthedocs.io/en/latest/composing_models.html
Thanks for you suggestions.
Hi,
I'm working on a regression problem with multiple real valued targets. An exception is thrown by UMAP (attached below). I assume that it happens because I'm passing a multidimensional array as labels. Am I doing something wrong or is this mode not supported by the algorithm/implementation?
Thanks for everything!
Edit: After digging into the parameters for umap I found
target_metric
which I set to'l2'
, but I still get an error when my target has shape(n_samples, n_targets)