tidymodels / embed

Extra recipes for predictor embeddings
https://embed.tidymodels.org
Other
142 stars 18 forks source link

step_umap crashing Rstudio #135

Open mkhansa opened 2 years ago

mkhansa commented 2 years ago

Hi Guys, 1- Rstudio is crashing after using "step_umap". I'm getting "R Session Aborted, R encountered a fatal error..." Code:

library(recipes) library(dplyr) library(ggplot2) library(embed)

recipe(Species ~ ., data = iris) %>% step_center(all_numeric_predictors()) %>% step_scale(all_numeric_predictors()) %>% step_umap(all_numeric_predictors(), num_comp = 2) %>% prep(training = iris)

2- step_pls is not working (step_pls() failed: Error in loadNamespace(x) : there is no package called β€˜mixOmics’) where mixOmics is not available any more.

EmilHvitfeldt commented 2 years ago

Hello @mkhansa πŸ‘‹ that is unfortunate, it shouldn't have crashed the session altogether.

The error you are getting is telling you that it can't find the mixOmics package. And it is coming from step_pls() which you haven't shown. You might have posted the wrong reprex.

Try installing mixOmics and try running the code again.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("mixOmics")
mkhansa commented 2 years ago

Thank you @EmilHvitfeldt the second issue (step_pls) worked well ! it was :

if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(version = '3.15') BiocManager::install("mixOmics")

my post contains 2 independent parts, first the step_umap where my R session crashed after running it. image

second is step_pls where I had an error and it's solved.

EmilHvitfeldt commented 2 years ago

I see. Can you paste what you get if you call sessioninfo::session_info() after loading the packages

library(recipes)
library(dplyr)
library(ggplot2)
library(embed)
mkhansa commented 2 years ago
  ─ Session info ───────────────────────────────────────────────────
  setting  value
  version  R version 4.2.0 (2022-04-22 ucrt)
  os       Windows 10 x64 (build 19043)
  system   x86_64, mingw32
  ui       RStudio
  language (EN)
  collate  English_United States.utf8
  ctype    English_United States.utf8
  tz       Asia/Beirut
  date     2022-06-15
  rstudio  2022.02.3+492 Prairie Trillium (desktop)
  pandoc   NA

  ─ Packages ───────────────────────────────────────────────────────
  package      * version    date (UTC) lib source
  assertthat     0.2.1      2019-03-21 [1] CRAN (R 4.0.0)
  base64enc      0.1-3      2015-07-28 [1] CRAN (R 4.0.0)
  BiocManager    1.30.18    2022-05-18 [1] CRAN (R 4.2.0)
  class          7.3-20     2022-01-16 [2] CRAN (R 4.2.0)
  cli            3.3.0      2022-04-25 [1] CRAN (R 4.2.0)
  clue           0.3-61     2022-05-30 [1] CRAN (R 4.2.0)
  cluster        2.1.3      2022-03-28 [1] CRAN (R 4.1.3)
  codetools      0.2-18     2020-11-04 [2] CRAN (R 4.2.0)
  colorspace     2.0-3      2022-02-21 [1] CRAN (R 4.1.2)
  combinat       0.0-8      2012-10-29 [1] CRAN (R 4.0.0)
  crayon         1.5.1      2022-03-26 [1] CRAN (R 4.1.3)
  data.table     1.14.2     2021-09-27 [1] CRAN (R 4.1.1)
  DBI            1.1.2      2021-12-20 [1] CRAN (R 4.1.2)
  digest         0.6.29     2021-12-01 [1] CRAN (R 4.1.2)
  doParallel     1.0.17     2022-02-07 [1] CRAN (R 4.1.2)
  dplyr        * 1.0.9      2022-04-28 [1] CRAN (R 4.2.0)
  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.0.5)
  embed        * 0.2.0      2022-04-13 [1] CRAN (R 4.2.0)
  factoextra     1.0.7      2020-04-01 [1] CRAN (R 4.0.2)
  fansi          1.0.3      2022-03-24 [1] CRAN (R 4.1.3)
  fastmap        1.1.0      2021-01-25 [1] CRAN (R 4.0.4)
  forcats        0.5.1      2021-01-27 [1] CRAN (R 4.0.4)
  foreach        1.5.2      2022-02-02 [1] CRAN (R 4.1.2)
  fs             1.5.2      2021-12-08 [1] CRAN (R 4.1.2)
  future         1.26.1     2022-05-27 [1] CRAN (R 4.2.0)
  future.apply   1.9.0      2022-04-25 [1] CRAN (R 4.2.0)
  generics       0.1.2      2022-01-31 [1] CRAN (R 4.1.2)
  ggplot2      * 3.3.6      2022-05-03 [1] CRAN (R 4.2.0)
  ggrepel        0.9.1      2021-01-15 [1] CRAN (R 4.0.4)
  globals        0.15.0     2022-05-09 [1] CRAN (R 4.2.0)
  glue           1.6.2      2022-02-24 [1] CRAN (R 4.1.2)
  gower          1.0.0      2022-02-03 [1] CRAN (R 4.1.2)
  gtable         0.3.0      2019-03-25 [1] CRAN (R 4.0.0)
  hardhat        1.1.0      2022-06-10 [1] CRAN (R 4.2.0)
  haven          2.5.0      2022-04-15 [1] CRAN (R 4.2.0)
  highr          0.9        2021-04-16 [1] CRAN (R 4.0.5)
  hms            1.1.1      2021-09-26 [1] CRAN (R 4.1.1)
  htmltools      0.5.2      2021-08-25 [1] CRAN (R 4.1.1)
  httpuv         1.6.5      2022-01-05 [1] CRAN (R 4.1.2)
  ipred          0.9-13     2022-06-02 [1] CRAN (R 4.2.0)
  iterators      1.0.14     2022-02-05 [1] CRAN (R 4.1.2)
  jsonlite       1.8.0      2022-02-22 [1] CRAN (R 4.1.2)
  keras          2.9.0      2022-05-23 [1] CRAN (R 4.2.0)
  klaR           1.7-0      2022-03-10 [1] CRAN (R 4.1.2)
  labelled       2.9.1      2022-05-05 [1] CRAN (R 4.2.0)
  later          1.3.0      2021-08-18 [1] CRAN (R 4.1.0)
  lattice        0.20-45    2021-09-22 [2] CRAN (R 4.2.0)
  lava           1.6.10     2021-09-02 [1] CRAN (R 4.1.1)
  lifecycle      1.0.1      2021-09-24 [1] CRAN (R 4.1.1)
  listenv        0.8.0      2019-12-05 [1] CRAN (R 4.0.0)
  lubridate      1.8.0      2021-10-07 [1] CRAN (R 4.1.1)
  magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.1.3)
  MASS           7.3-57     2022-04-22 [1] CRAN (R 4.2.0)
  Matrix         1.4-1      2022-03-23 [1] CRAN (R 4.1.3)
  mime           0.12       2021-09-28 [1] CRAN (R 4.1.1)
  miniUI         0.1.1.1    2018-05-18 [1] CRAN (R 4.0.0)
  munsell        0.5.0      2018-06-12 [1] CRAN (R 4.0.0)
  nnet           7.3-17     2022-01-16 [2] CRAN (R 4.2.0)
  parallelly     1.32.0     2022-06-07 [1] CRAN (R 4.2.0)
  pillar         1.7.0      2022-02-01 [1] CRAN (R 4.1.2)
  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.0.0)
  png            0.1-7      2013-12-03 [1] CRAN (R 4.0.0)
  prodlim        2019.11.13 2019-11-17 [1] CRAN (R 4.0.0)
  promises       1.2.0.1    2021-02-11 [1] CRAN (R 4.0.4)
  purrr          0.3.4      2020-04-17 [1] CRAN (R 4.0.0)
  questionr      0.7.7      2022-01-31 [1] CRAN (R 4.1.2)
  R6             2.5.1      2021-08-19 [1] CRAN (R 4.1.0)
  Rcpp           1.0.8.3    2022-03-17 [1] CRAN (R 4.1.3)
  readr          2.1.2      2022-01-30 [1] CRAN (R 4.1.2)
  recipes      * 0.2.0      2022-02-18 [1] CRAN (R 4.1.2)
  reticulate     1.25       2022-05-11 [1] CRAN (R 4.2.0)
  rlang          1.0.2      2022-03-04 [1] CRAN (R 4.1.3)
  rpart          4.1.16     2022-01-24 [2] CRAN (R 4.2.0)
  rstudioapi     0.13       2020-11-12 [1] CRAN (R 4.0.3)
  scales         1.2.0      2022-04-13 [1] CRAN (R 4.2.0)
  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.1.2)
  shiny          1.7.1      2021-10-02 [1] CRAN (R 4.1.1)
  stringi        1.7.6      2021-11-29 [1] CRAN (R 4.1.2)
  stringr        1.4.0      2019-02-10 [1] CRAN (R 4.0.0)
  survival       3.3-1      2022-03-03 [1] CRAN (R 4.1.2)
  tensorflow     2.9.0      2022-05-21 [1] CRAN (R 4.2.0)
  textdata       0.4.2      2022-05-02 [1] CRAN (R 4.2.0)
  textrecipes    0.5.2      2022-05-04 [1] CRAN (R 4.2.0)
  tfruns         1.5.0      2021-02-26 [1] CRAN (R 4.1.2)
  tibble         3.1.7      2022-05-03 [1] CRAN (R 4.2.0)
  tidyr          1.2.0      2022-02-01 [1] CRAN (R 4.1.2)
  tidyselect     1.1.2      2022-02-21 [1] CRAN (R 4.1.2)
  timeDate       3043.102   2018-02-21 [1] CRAN (R 4.0.0)
  tzdb           0.3.0      2022-03-28 [1] CRAN (R 4.1.3)
  utf8           1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
  uwot           0.1.11     2021-12-02 [1] CRAN (R 4.2.0)
  vctrs          0.4.1      2022-04-13 [1] CRAN (R 4.2.0)
  whisker        0.4        2019-08-28 [1] CRAN (R 4.0.0)
  withr          2.5.0      2022-03-03 [1] CRAN (R 4.1.2)
  xtable         1.8-4      2019-04-21 [1] CRAN (R 4.0.0)
  zeallot        0.1.0      2018-01-28 [1] CRAN (R 4.0.0)

  [1] C:/Users/Toshiba/AppData/Local/R/win-library/4.2
  [2] C:/Program Files/R/R-4.2.0/library
EmilHvitfeldt commented 2 years ago

Hmm, that is odd. Could you try re-install the uwot package, restart the R session and try again?

mkhansa commented 2 years ago

I re-installed the uwot, embed, and umap packages, still I have the same crashing problem after the prep Step

library(recipes)
library(dplyr)
library(ggplot2)
library(embed)
library(uwot)

  Umap <- recipe(Species ~ ., data = iris) %>%
    step_center(all_numeric_predictors()) %>%
    step_scale(all_numeric_predictors()) %>%
    step_umap(all_numeric_predictors(), num_comp = 2)
# it works till here

  Umap %>% prep()
# R crash after this step 
EmilHvitfeldt commented 2 years ago

Thank you!

Are you able to run the following code? If yes then you are having a {uwot} issue and not a {embed} issue.

library(uwot)
umap(X = iris[, 1:4], n_neighbors = 15, n_components = 2, 
     learning_rate = 1, min_dist = 0.01, verbose = FALSE, n_threads = 1)
mkhansa commented 2 years ago

the code has been executed successfully (output of 2 components).

the uwot package has been re-installed (with dependencies) any advise ?

Thank you for your time!

EmilHvitfeldt commented 2 years ago

Thank you for your time!

That is why I'm here!

Idea 1

have you tried installing the dev versions of {embed} and {recipes}?

# install.packages("remotes")
remotes::install_github("tidymodels/recipes")
remotes::install_github("tidymodels/embed")

Idea 2

Do you still get issues when running a smaller recipe?

library(recipes)
library(embed)

recipe(Species ~ ., data = iris) %>%
  step_umap(all_numeric_predictors(), num_comp = 2) %>%
  prep()
mkhansa commented 2 years ago

idea 2 already tested and failed. idea 1 tested, and failed..

EmilHvitfeldt commented 2 years ago

Can you reproduce this result?

library(recipes)
library(embed)

rec <- recipe(Species ~ ., data = iris) %>%
  step_umap(all_numeric_predictors(), num_comp = 2)

embed:::prep.step_umap(rec$steps[[1]], iris, rec$var_info)
#> UMAP embedding for Sepal.Length, Sepal.Width, Petal.Length, Petal... [trained]
mkhansa commented 2 years ago

sure.. image

EmilHvitfeldt commented 1 year ago

Hello @mkhansa πŸ‘‹

Sorry for the long wait, are you still seeing this problem?

nhardtskim commented 1 year ago

Hello @mkhansa πŸ‘‹

Sorry for the long wait, are you still seeing this problem?

Using version 1.0.0.9000, installed from Github today, it crashes on my windows machine. It does, however, work fine on debian/aarch64.

EmilHvitfeldt commented 1 year ago

Thank you @nhardtskim !! Do you still get an error if you use the dev version of {uwot}?

install.packages("remotes")
remotes::install_github("jlmelville/uwot")
nhardtskim commented 1 year ago

Thanks, @EmilHvitfeldt. Yea, still crashing.

other attached packages:
[1] uwot_0.1.14.9000 Matrix_1.5-3     embed_1.0.0.9000
[4] recipes_1.0.4    dplyr_1.1.0

I have to add that I never found uwot to be quite stable, it would usually start crashing once the number of components exceeded 15 or so.

EmilHvitfeldt commented 1 year ago

@jlmelville πŸ‘‹ I'm bringing you in to ask if you are able to tell if this issue is an {uwot} or {embed} issue? Many thanks! πŸ˜„

jlmelville commented 1 year ago

Hello, so it seems that there is evidence from multiple users that running:

library(recipes)
library(embed)

rec <- recipe(Species ~ ., data = iris) %>%
  step_umap(all_numeric_predictors(), num_comp = 2)

embed:::prep.step_umap(rec$steps[[1]], iris, rec$var_info)
#> UMAP embedding for Sepal.Length, Sepal.Width, Petal.Length, Petal... [trained]

on Windows causes the session to crash?

I do most of my R development on Windows (weird I know), but I cannot reproduce this. I ran the above in RStudio on Windows and it finished without problems. Also, running that snippet under Linux with valgrind didn't trigger anything so I am at a bit of a loss as to what is going on. That said, there is a lot of C++ code in uwot and I have definitely caused these issues many many times in the past, so if I was a betting man, I would say it's something I did. I just don't know what.

Based on the snippet and using the debugger in RStudio, it seems like here uwot is getting called with a dataframe containing the 4 numeric columns of the iris dataframe in X and n_components = 2? Is there anything else going on that would be "unusual"? iris is the main dataset that I use to test uwot, and I must have run it on that same data hundreds of times by now, so I am very surprised by this, to say the least.

If there is anything specific I can do to help diagnose the issue further, please let me know.