macss-modeling / General-Questions

A repo to post questions about code, data, etc.
0 stars 0 forks source link

0226 Inclass Coding Question regarding UMAP Code Running Time #27

Closed mintaow closed 3 years ago

mintaow commented 3 years ago

Hey crew, for the UMAP section of our in-class coding task on Thursday's class (0226), my code runs incredibly slowly (definitely more than 10 minutes), while my code for t-SNE only takes around 10 seconds. This seems a bit counterintuitive to me and as far as I am concerned, this seems not an individual case. It would be really appreciated if I can have any comments on this. Thanks so much in advance!

Below is my code, which I barely made any change from the one we went through in the lecture. I apologize for not having the exact time spent tracked. It just takes forever.

{
  tic()
  umap_fit_1000 <- anes[,1:35] %>% 
    umap(n_neighbors = 1000,
         metric = "euclidean")

  umap_fit_1000 <- anes %>% 
    mutate_if(.funs = scale,
              .predicate = is.numeric,
              scale = FALSE) %>% 
    mutate(First_Dimension = umap_fit_1000$layout[,1],
           Second_Dimension = umap_fit_1000$layout[,2]) %>% 
    gather(key = "Variable",
           value = "Value",
           c(-First_Dimension, -Second_Dimension, -democrat))

  k_1000 <- ggplot(umap_fit_1000, aes(First_Dimension, Second_Dimension, 
                                col = factor(democrat))) + 
    geom_point(alpha = 0.6) +
    scale_color_manual(values=c(amerika_palettes$Republican[1], 
                                amerika_palettes$Democrat[1]),
                       name="Democrat",
                       breaks=c("-0.418325434439179", 
                                "0.581674565560822"),
                       labels=c("No", 
                                "Yes")) +
    labs(title = " ",
         subtitle = "Neighborhood size: 5; Epochs = 500",
         x = "First Dimension",
         y = "Second Dimension") +
    theme_minimal()
  toc()
}
pdwaggoner commented 3 years ago

Yeah, I ran into this issue too with this chunk. The short answer is, it's a massive search space, and takes a while. Are you using parallel processing? It might speed it up a bit, but likely you'll have to sit and wait (and maybe drink some coffee) while runs. It should converge before an hour though.

mintaow commented 3 years ago

Thanks for the explanation. It helps a lot!