Open billypeanut opened 3 years ago
Bounded metrics as output metrics are often a bad idea. You would probably be better off doing something akin to the example of hyperbolic space in the docs: find a space with more amenable properties that has a convenient isometry to the one you want and use that instead.
Thanks for the super fast response. I did also try the Haversine output_metric, which was quite nice because it forces things to the sphere which gives me nice cosine properties. I think I need to think about the problem some more. Thanks again! BP
Hi there,
I'm getting some strange results when requesting the output metric of my results be "cosine", and I'm unsure whether it's my misinterpretation of what this is doing (probable) or a mistake in the code (less probable). I assume that the output_metric argument means that this metric is the measure of distance to use to interpret the output points. As a test, I input a pre-calculated distance matrix: [[0,1,2,1], [1,0,1,2], [2,1,0,1], [1,2,1,0]] which could describe 4 points, A,B,C,D, equally spaced around an origin point, measured with a cosine distance metric. I would therefore have expected the output points from UMAP to look something like that, and to have distance properties that closely match the input distance matrix. Instead, I get 4 points all along the same line, which if I normalise, end up having almost the exact same coordinates, and therefore have a cosine distance of 0 between them all. Am I right in what I'm expecting as an output, or am I misinterpreting things? One thing that seems to be missing is that a Euclidean space is unbounded, but cosine distance must be between 0 and 2, so I don't know how UMAP determines things to be as-far-apart-as-possible in a cosine distance metric space. Perhaps it's expecting points with an output distance of 2 in the cosine space to have a distance of 'Inf' in the input distance matrix and since the furthest distance in my matrix is 2, that's considered to be quite close? I can't see if there are any other input arguments to tell UMAP that 2 is actually the farthest apart two points can be - since it's a pre-calculated matrix, I can't give any more information about the distance matrix.
Danke, BP