predict() on a umap object with n_components=1 gets two errors -- Looks like missing drop=F

JenniferSLyon commented 5 years ago

Based on the example in the vignette:

iris.data = iris[, grep("Sepal|Petal", colnames(iris))] iris.labels = iris[, "Species"] custom.config = umap.defaults custom.config$n_components = 1 iris.umap = umap(iris.data, config=custom.config)

set.seed(19) iris.wnoise = iris.data + matrix(rnorm(150*40, 0, 0.1), ncol=4) colnames(iris.wnoise) = colnames(iris.data) iris.wnoise.umap = predict(iris.umap, iris.wnoise)

Error in colMeans(embedding[knn.indexes[i, ], ]) : 'x' must be an array of at least two dimensions

traceback() 6: stop("'x' must be an array of at least two dimensions") 5: colMeans(embedding[knn.indexes[i, ], ]) 4: make.initial.spectator.embedding(umap$layout, spectator.knn$indexes) 3: implementations[[method]](object, data) 2: predict.umap(iris.umap, iris.wnoise) 1: predict(iris.umap, iris.wnoise)

Looking at make.initial.spectator.embedding, it looks like a drop=F is missing (line with ## <-----):

trace(umap:::make.initial.spectator.embedding, edit=T)

function (embedding, knn.indexes) { result = matrix(0, nrow = nrow(knn.indexes), ncol = ncol(embedding)) rownames(result) = rownames(knn.indexes) knn.indexes = knn.indexes[, 2:ncol(knn.indexes), drop = FALSE] for (i in 1:nrow(result)) { result[i, ] = colMeans(embedding[knn.indexes[i, ], , drop = FALSE]) ## <------- added drop = FALSE } result }

This change leads to a new error:

iris.wnoise.umap = predict(iris.umap, iris.wnoise) Error in temp.embedding[, temp.index] <- result[, indeces[i]] : incorrect number of subscripts on matrix

traceback() 4: naive.simplicial.set.embedding(graph, embedding, config, fix.observations = V) 3: implementations[[method]](object, data) 2: predict.umap(iris.umap, iris.wnoise) 1: predict(iris.umap, iris.wnoise)

And it also looks like a drop=F is missing in naive.simlicial.set.embedding:

naive.simplicial.set.embedding function (g, embedding, config, fix.observations = NULL) { if (config$n_epochs == 0) { return(embedding) } result = t(embedding) gmax = max(g$coo[, "value"]) g$coo[g$coo[, "value"] < gmax/config$n_epochs, "value"] = 0 g = reduce.coo(g) eps = cbind(g$coo, eps = make.epochs.per.sample(g$coo[, "value"], config$n_epochs)) if (is.null(fix.observations)) { result = naive.optimize.embedding(result, config, eps) } else { eps = eps[eps[, "from"] > fix.observations, ] indeces = seq(fix.observations + 1, ncol(result)) seeds = column.seeds(result[, indeces, drop = FALSE], key = config$transform_state) temp.index = fix.observations + 1 temp.embedding = result[, seq_len(fix.observations + 1), drop = FALSE] ## <----- added drop=FALSE temp.eps = split.data.frame(eps, eps[, "from"]) for (i in seq_along(indeces)) { temp.embedding[, temp.index] = result[, indeces[i]] set.seed(seeds[i]) i.eps = temp.eps[[as.character(indeces[i])]] if (!is.null(i.eps)) { i.eps[, "from"] = temp.index temp.result = naive.optimize.embedding(temp.embedding, config, i.eps) } result[, indeces[i]] = temp.result[, temp.index] } } colnames(result) = g$names t(result) }

With these two changes predict() now runs without error and returns values. I am not sure if there are deeper issues with predicting with n_components=1, or if these two changes are sufficient.

sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.3 LTS

Matrix products: default BLAS: /mnt/drive2/r-project/R-3.6.1/lib/libRblas.so LAPACK: /mnt/drive2/r-project/R-3.6.1/lib/libRlapack.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics utils datasets grDevices methods base

other attached packages: [1] umap_0.2.3 colorspace_1.4-1

loaded via a namespace (and not attached): [1] compiler_3.6.1 Matrix_1.2-17 tools_3.6.1 reticulate_1.13 [5] Rcpp_1.0.2 RSpectra_0.15-0 grid_3.6.1 jsonlite_1.6 [9] openssl_1.4.1 lattice_0.20-38 askpass_1.1

tkonopka commented 5 years ago

Thanks for pointing that out. Yes, those two drop=FALSE will fix this. Would you like to make a pull request, or should I go ahead and edit?

JenniferSLyon commented 5 years ago

You can just go ahead and edit.

tkonopka / umap

predict() on a umap object with n_components=1 gets two errors -- Looks like missing drop=F #10