The performance of multicore computing is SIGNIFICANTLY inferior

randy3k / radian

A 21 century R console

MIT License

1.96k stars 73 forks source link

The performance of multicore computing is SIGNIFICANTLY inferior #411

Open albert-ying opened 1 year ago

albert-ying commented 1 year ago

Test code

# Load the libraries
library(pbmcapply)
library(data.table)

# Define a more complex function to test
test_function <- function(x) {
  # Create a data.table
  dt <- data.table(a = 1:1000000, b = 1:1000000 + 10)

  # Write the data.table to a temporary file
  temp_file <- tempfile(fileext = ".csv")
  fwrite(dt, temp_file)

  # Read the data.table from the temporary file
  dt_read <- fread(temp_file)

  # Perform a calculation on the data.table
  dt_result <- dt_read[, .(result = a * x + b)]

  # Return the result
  return(dt_result)
}

# Create a vector of input values
input_values <- 1:1000

# Benchmark the time needed for pbmclapply
benchmark_time <- system.time({
  result <- pbmclapply(input_values, test_function, mc.cores = 10)
})

# Print the benchmark time
print(benchmark_time)

using Rscript

user system elapsed 3.185 9.992 33.849

using radian

user system elapsed 3.920 15.280 175.334

using R console

user system elapsed 2.274 8.727 58.214

The radian also uses significantly more RAM after running for a while (to me seems like gc is not effectively working)

albert-ying commented 1 year ago

I also noticed that when testing, the radian is spawning python processes, instead of R

albert-ying commented 1 year ago

I'm really shocked by this (no offense, I really like radian) -- I've being using radian for 4 years and I did a lot of multithread computing -- only find it out today. So many hours wasted...

D3SL commented 1 year ago

Radian has some severe issues in general with cleaning up after itself. Many of us are unable to use the language server at all because it will simply produce an every growing number of R processes. The problem seems related to the development environment because the developer is unable to reproduce the issue on their machine but many users experience it in deployment on various installations.

It's a serious shame because VScode and radian is a massively superior experience to RStudio.

randy3k commented 10 months ago

@D3SL By the way, why do you want to run languageserver on top of radian? You shouldn't need radian to execute the languageserver.

randy3k commented 10 months ago

@albert-ying Sorry for getting you back late.

Ultimately, it depends on how many processes were spawned, how long each process takes and whether the tasks are cpu/memory expensive.

For example, if each task is computation expensive like the following

system.time(parallel::mclapply(1:8, function(i) {
    x <- rnorm(4e3)
    y <- rnorm(4e3)
    s <- svd(x %o% y)
    sum(s$d)
}, mc.cores = 8))

radian

   user  system elapsed
153.260   0.672  22.688

   user  system elapsed
152.947   0.840  23.304

there is virtually no difference between radian and R.

For memory expensive tasks, radian is not a suitable replacement for the vanilla R because radian is in fact a python process thus it requires more memory resources.