jlmelville / uwot

An R package implementing the UMAP dimensionality reduction method.
https://jlmelville.github.io/uwot/
GNU General Public License v3.0
321 stars 31 forks source link

Differences in model parameters when calling umap() causes umap_transform() to error #97

Closed AustinHartman closed 2 years ago

AustinHartman commented 2 years ago

Hi - thanks for your work on this package. I've recently installed v0.1.13 and noticed some additional items in the returned UMAP models. It seems that when umap is called with precomputed nearest neighbors, the returned model can successfully be used as input to umap_transform but when umap is called without precomputed nearest neighbors, the resulting model differs slightly and umap_transform seems to error out somewhere around here. Increasing the num_precomputed_nns from 0 to 1 in the model which failed seemed to ameliorate the issue. Any thoughts on what's going on here? Thanks in advance!

library(uwot)

X_train <-as.matrix(iris[c(1:10,51:60), -5])
X_test <- as.matrix(iris[101:110, -5])

train_nn <- uwot:::annoy_nn(X = X_train, k = 4,
                          metric = "euclidean", n_threads = 0,
                          ret_index = TRUE)
umap_train_x_null <- umap(X = NULL, nn_method = train_nn, ret_model = TRUE,
                        n_neighbors = 4)
umap_train_x <- umap(X = X_train, ret_model = TRUE,
                          n_neighbors = 4)

# Get query neighbors
query_ref_nn <- uwot:::annoy_search(X = X_test, k = 4,
                             ann = train_nn$index, n_threads = 0)
row.names(query_ref_nn$dist) <- row.names(X_test)

# Success
umap_test_1 <- umap_transform(X = NULL, model = umap_train_x_null,
                                 nn_method = query_ref_nn)

# Error in umap_transform(X = NULL, model = umap_train_x, nn_method = query_ref_nn) : 
#   Expecting 
umap_test_2 <- umap_transform(X = NULL, model = umap_train_x,
                                 nn_method = query_ref_nn)

# Success
umap_train_x$num_precomputed_nns <- 1
umap_test_3 <- umap_transform(X = NULL, model = umap_train_x,
                                 nn_method = query_ref_nn)

sessionInfo():

R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] uwot_0.1.13  Matrix_1.4-2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.3     codetools_0.2-18 lattice_0.20-45  digest_0.6.29    grid_4.1.3       evaluate_0.15   
 [7] rlang_1.0.3      cli_3.3.0        rstudioapi_0.13  rmarkdown_2.14   RcppAnnoy_0.0.19 tools_4.1.3     
[13] xfun_0.31        fastmap_1.1.0    compiler_4.1.3   htmltools_0.5.2  knitr_1.39 
jlmelville commented 2 years ago

My thoughts are: I completely overlooked this as a possible way of doing things (the increasingly horrid rat's nest of logic for all the possible ways to provide data to umap and umap_transform is a stellar example of how not to program a computer). Sorry for breaking this, and thank you for not only reporting it, but working out what the problem was. The error message was completely useless too, so whatever the opposite of kudos is to me all round.

I think this is now fixed on the master branch. There will be a new submission to CRAN as soon as I also complete #96 (which should mean that I don't have to wait a couple of months before my next submission).

jlmelville commented 2 years ago

@AustinHartman version 0.1.14 of uwot is now on CRAN and should contain this fix.

AustinHartman commented 2 years ago

Great, this resolves my issue. Thank you for such a fast fix!