microsoft / sqlmlutils

Utility functions for easier usage of SQL Server Machine Learning Services
Other
32 stars 33 forks source link

R script execution slow after package installation #95

Closed rmeans closed 2 years ago

rmeans commented 2 years ago

This issue may not be directly related to this SQLMLUTILS, but maybe you would be able to provide some context or a potential solution.

As we install more libraries, especially tidyverse which installs 70+ sub packages, execution of even the most basic R script goes from sub second execution to 7+ seconds.

Here is the installation script:

library(sqlmlutils)

connection <- connectionInfo(
  server   = "XXXXXX",
  database = "XXXXXX")

r = getOption("repos")
r["CRAN"] = "https://mran.microsoft.com/snapshot/2019-02-01"
options(repos = r)

pkgs <- c("tictoc", "data.table", "tidyverse", "zoo", "DBI")
sql_install.packages(connectionString = connection, pkgs = pkgs, verbose = TRUE, scope = "PUBLIC")

The R script I am testing with is:

EXECUTE sp_execute_external_script @language = N'R'
    , @script = N'
a <- 1
b <- 2
c <- a/b
d <- a*b
print(c(c, d))
'

As a separate test on the same server to rule out any issues with the packages, I installed the packages using the process documented for SQL Server 2017 which installs packages globally for all databases, the script executes in sub seconds times.

Script to test global installation of packages:

r = getOption("repos")
r["CRAN"] = "https://mran.microsoft.com/snapshot/2019-02-01"
options(repos = r)

pkgs <- c("tictoc", "data.table", "tidyverse", "zoo", "DBI")
install.packages(pkgs)

This would be fine, but our plan is to run the database in SQL Managed Instance on Azure and this global approach is not an option. By the way, I did try the test of running the simple script before and after installing packages on Azure and experienced the same behavior, less than a second before packages, 7+ seconds after packages.

Any guidance you could provide would be great!

jarupatj commented 2 years ago

This is a known issue. I think one of you 70+ packages might contains source packages like BH. Can you see if you have them? If so, can you try remove the source package? Source package like BH has a lot of files. We have an existing known issue that the query performance will degrade when there a lot of files in the installed external libraries. The mitigation right now is to remove the source package from the list of installed packages. We cannot run source package in SQL Server or SQL MI anyway.

We are working on the fix for this with high priority.

mcollis2 commented 2 years ago

Is there another thread reporting this issue for tracking, or can we follow this issue here? Thanks!

jarupatj commented 2 years ago

Feel free to follow this thread. The fix will be in SQL Server and we are targeting mid next year. In the mean time, I suggest going through the list of packages and remove source packages like BH.

seantleonard commented 2 years ago

The latest release of SQLMLUtils for R (version 1.2.1 - GitHub Releases) is published with changes from #99 to improve execution time for this scenario.