microsoft / sqlmlutils

Utility functions for easier usage of SQL Server Machine Learning Services
Other
32 stars 33 forks source link

Dev/seleonar/r packagename parsing #107

Closed seantleonard closed 1 year ago

seantleonard commented 1 year ago

Why is this change needed?

In order to install a newer version of a package that doesn't have a binary for the desired version of R (i.e. SQL Managed Instance with R 3.5.2), the package must be built from source using the desired version of R.

For example, the package data.table's most recent version is 1.14.6, but was built using R 4.2.2 which is a higher major.minor version number than R 3.5.2. Therefore, a binary package of data.table must be built from source using R 3.5.2 and Rtools35 locally.

As a result of the requirement to build a binary package, I discovered a bug where sqlmlutils did not parse binary package filenames correctly. For the package data.table, the regex \.|_ was erroneously parsing the package name using the value before the first period OR before the underscore, causing the resolved package name to be data instead of data.table, causing installation failure on SQL MI.

What is this change?

This change fixes the regex to resolve a package name by looking at the character sequence before an underscore _ in the binary package name. Binary packages on CRAN follow the syntax `_.zip for Windows. An example registry of binary packages for Windows for R 3.5.X can be found on CRAN

While the only functionality change in this PR is the regex modification, I also split out the relevant sections of code into smaller, separate helper functions for easier readability and testing.

How was this tested?

Sample Execution Steps

library(sqlmlutils)
connection <- connectionInfo(server= "tcp:<server>,<port>", database = "<DBName>",uid = "<userName>",pwd= "<pwd>")
pkgPath <- c('C:\\data.table_1.14.6.zip')
sql_install.packages(connectionString = connection, pkgPath, verbose = TRUE, scope="PUBLIC", repos=NULL)
arunguru-msft commented 1 year ago

The workaround to build the package locally and install remotely will probably work only if the version of the R is the same. Do we have validations for the same in the E2E workflow?

seantleonard commented 1 year ago

The workaround to build the package locally and install remotely will probably work only if the version of the R is the same. Do we have validations for the same in the E2E workflow?

There is code that covers that scenario here, but no test coverage at the moment. That is outside the scope of this particular fix : https://github.com/microsoft/sqlmlutils/blob/b888ff1e875592446c1cde6256e2a61162066388/R/R/sqlPackage.R#L1344-L1419