Open alkat19 opened 2 months ago
I've restated your example in runnable code. Can you confirm this is accurate? (Replicating openintro::nycflights
creates a data frame with 200,000 rows, not 2 million.)
Here's what I see which doesn't seem very surprising to me. Not a big lag while running the model.
https://github.com/user-attachments/assets/8fab7218-6afd-4f4b-bcd1-c05d3591d094
If you're having a really different experience, can you capture a screen recording?
Here's the reprex:
# Replicate the openintro::nycflights dataset 6 times
library(openintro)
#> Loading required package: airports
#> Loading required package: cherryblossom
#> Loading required package: usdata
library(tidyverse)
dim(nycflights)
#> [1] 32735 16
dat <- rbind(
nycflights, nycflights, nycflights,
nycflights, nycflights, nycflights
)
# Create a meaningless binary outcome based on hour:
# data <- data %>% mutate(outcome = if_else(hour > 15, 0, 1))
dat <- dat |>
mutate(outcome = if_else(hour > 15, 0, 1))
# Positron lags a lot when running a (glm) model in a data of 2mil rows and 18 columns.
dim(dat)
#> [1] 196410 17
# Run a random glm:
model <- glm(
outcome ~ day + month + dep_time + air_time + distance,
family = "binomial",
data = dat
)
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# See the platform lagging (cannot scroll through code for a couple of seconds)
Created on 2024-09-11 with reprex v2.1.1.9000
I have updated my initial post with a code to reproduce it.
Here's a new reprex. The system.time()
result seen here (about 11 seconds) matches what I'm seeing in both Positron and RStudio. And I don't see any lags in either IDE other than the time spent running the model. So far, I'm not able to reproduce.
# Replicate the openintro::nycflights dataset 6 times
library(openintro)
#> Loading required package: airports
#> Loading required package: cherryblossom
#> Loading required package: usdata
library(tidyverse)
dim(nycflights)
#> [1] 32735 16
dat <- do.call(rbind, replicate(n = 64, nycflights, simplify = FALSE))
# Create a meaningless binary outcome based on hour:
# data <- data %>% mutate(outcome = if_else(hour > 15, 0, 1))
dat <- dat |>
mutate(outcome = if_else(hour > 15, 0, 1))
# Positron lags a lot when running a (glm) model in a data of 2mil rows and 18 columns.
dim(dat)
#> [1] 2095040 17
# Run a random glm:
system.time(
model <- glm(
outcome ~ day + month + dep_time + air_time + distance,
family = "binomial",
data = dat
)
)
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#> user system elapsed
#> 9.387 1.384 10.837
# See the platform lagging (cannot scroll through code for a couple of seconds)
Created on 2024-09-11 with reprex v2.1.1.9000
https://github.com/user-attachments/assets/25dd3135-6fcd-49eb-81dc-dacf35bb9ca1
I cannot drag files more than 10 MB, unfortunately, so I need to make a short recording. You can see that I cannot scroll after the red button disappears. Also, other operations after that point lag as well. For example, doing the following operation after the model runs also lags, and I am again unable to scroll properly :
dat <- dat |>
mutate(outcome2 = if_else(hour > 13, 0, 1))
Can you also provide a screen recording where I can see the ability to scroll through code seamlessly after the model is run?
the ability to scroll through code seamlessly
OK this is interesting -- to look at scrolling specifically. I'm talking about scrolling in the R Console.
I do see peculiar scrolling behaviour once I've fitted the model (feeling stuck, feeling slow). It feels like that begins when the model
object first populates into the SESSION pane (but then of course that also coincides with the model
object existing in the first place). But the environment viewer (in, e.g., RStudio) has a lot of potential for doing unsavory things in the presence of a large object.
Sounds similar to #4573
If I'm right about the variables pane, then maybe related to #2223
Maybe resembles #4008? Maybe connected to #2797?
The problem is that it becomes an issue when there are hundreds of code lines after the model fit since everything feels stuck or slow after that point. The behavior is not exhibited in RStudio or VSCode (which I currently use with Radian).
Thanks for looking into this.
System details:
Positron and OS details:
Positron Version: 2024.09.0 (Universal) build 1 Code - OSS Version: 1.92.0 Commit: f37f4f5044a2a619e73d5db61a31e37fbd3faf18 Date: 2024-09-03T02:37:20.474Z (1 wk ago) Electron: 30.1.2 Chromium: 124.0.6367.243 Node.js: 20.14.0 V8: 12.4.254.20-electron.0 OS: Darwin arm64 23.6.0
Interpreter details:
R 4.4.1
Describe the issue:
Positron lags a lot when running a (glm) model in a data of 2mil rows.
Steps to reproduce the issue:
Expected or desired behavior:
There should be no lags apart from the waiting time for the model to run, similar to RStudio or VSCode
Were there any error messages in the UI, Output panel, or Developer Tools console?
No