oxford-pharmacoepi / MegaStudy

4 stars 2 forks source link

SQL error running executeChecks #2

Closed eric-fey-hus closed 7 months ago

eric-fey-hus commented 10 months ago

Database connection work fine. Details:

db <- dbConnect(odbc::odbc(),
                Driver = "{ODBC DRIVER 18 for SQL Server}", ...

Error occurs when running line 36 in RunFeasibility.R: executeChecks( ...):

Error: nanodbc/nanodbc.cpp:1769: 42S02: [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name 'tempdb.pdw.sysobjects'.

'select * from tempdb..sysobjects'

sessionInfo:

R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale: [1] LC_COLLATE=English_Europe.utf8 LC_CTYPE=English_Europe.utf8 LC_MONETARY=English_Europe.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Europe.utf8

attached base packages: [1] stats graphics grDevices datasets utils methods base

other attached packages: [1] DrugExposureDiagnostics_1.0.0 zip_2.3.0 log4r_0.4.3 here_1.0.1
[5] CDMConnector_1.1.2 CirceR_1.3.1 dbplyr_2.3.4 dplyr_1.1.3
[9] DBI_1.1.3 RPostgres_1.4.5 odbc_1.3.5

loaded via a namespace (and not attached): [1] Rcpp_1.0.11 pillar_1.9.0 compiler_4.2.1 remotes_2.4.2.1 tools_4.2.1 bit_4.0.5 jsonlite_1.8.7
[8] lifecycle_1.0.3 tibble_3.2.1 checkmate_2.2.0 pkgconfig_2.0.3 rlang_1.1.1 cli_3.6.1 rstudioapi_0.15.0 [15] curl_5.1.0 rJava_1.0-6 stringr_1.5.0 withr_2.5.1 generics_0.1.3 vctrs_0.6.4 hms_1.1.3
[22] bit64_4.0.5 rprojroot_2.0.3 tidyselect_1.2.0 glue_1.6.2 R6_2.5.1 fansi_1.0.5 purrr_1.0.2
[29] blob_1.2.4 magrittr_2.0.3 backports_1.4.1 renv_1.0.3 utf8_1.2.3 stringi_1.7.12

tiozab commented 10 months ago

@eric-fey-hus Thank you. Which database management system are you working with?

We use Postgres and I had the exact same error message myself when I just started my position here and it really was down to an error in the credentials on accessing the database, a copy paste error in fact. the db <- ... step is not the opening the connection just setting the credentials. the cdm connection is established later and the error is thrown when you are trying to use the cdm (in the execute step function).

can you please share this bit below with me and remove all the sensitive information (or alter them)

db <- dbConnect("...")

cdmSchema <- "..." # schema where cdm tables are located writeSchema <- "..." # schema with writing permission writePrefix <- "..." # combination of at least 5 letters + (eg. "abcde") that will lead any table written in the cdm dbName <- "..." # name of the database, use acronym in capital letters (eg. "CPRD GOLD")

for me it looks like this (I get all the information from the Renviron file, in case you do not have that, it may be worthwhile to set that up via running in the console : usethis::edir_r_environ() , after setting your Renviron file, you have to save it and re-start the R session so that the changes take effect) server_dbi <- Sys.getenv("DB_SERVER_DBI_gold_202207") user <- Sys.getenv("DB_USER") password <- Sys.getenv("DB_PASSWORD") port <- Sys.getenv("DB_PORT") host <- Sys.getenv("DB_HOST")

db <- dbConnect(RPostgres::Postgres(), dbname = server_dbi, port = port, host = host, user = user, password = password )

Of course you can also have all the information directly in the dbConnect() function.

after running the db step and setting all the paramters to create the cdm object cdmSchema <- "..." # schema where cdm tables are located writeSchema <- "..." # schema with writing permission writePrefix <- "..." # combination of at least 5 letters + (eg. "abcde") that will lead any table written in the cdm dbName <- "..." # name of the database, use acronym in capital letters (eg. "CPRD GOLD")

can you please run the following:

cdm <- cdmFromCon( con = db, cdmSchema = c(schema = cdmSchema), writeSchema = c(schema = writeSchema, prefix = writePrefix), cdmName = dbName )

cdm$person

given what you showed me above, this simple step should throw an error and prove that I was right about the error only occuring when you want to use the database, which means that the connection was not established and you need to double check your connection details.

Please send me all the code that I suggest above, and the answers R is giving you and post it here, so I can see what is going on :-)

eric-fey-hus commented 10 months ago

@tiozab Thanks! @pasirikala tagging you here as you might have better insight.

DBMS is synapse, more precisely: dedicated SQL pool in Azure Synapse Analytics.

The connection work fine and we can use the usual OMOP tables:

db <- dbConnect(odbc::odbc(),
                Driver = "{ODBC DRIVER 18 for SQL Server}",
                Server = "MYSERVER",  
                Database = "MYDB",
                UID = "MYUSER", 
                PWD = "MYPWD",
                )

Then I tested the connection using DBI::dbListTables(db), which worked fine.

For your test I did

#Test to resolve issue: https://github.com/oxford-pharmacoepi/MegaStudy/issues/2
cdm <- cdmFromCon(
  con = db,
  cdmSchema = c(schema = "omop54"),
  writeSchema = c(schema = "ohdsieric", prefix = "mega_"),
  cdmName = "HUS"
)
cdm$person

I am not showing the result of cdm$person obviously, that's sensitive info :-), but it worked just fine.

I think the issue here is how to list the temp tables on Azure Synapse. Could not get that working, and from what I understand temp tables do not persist between queries. Ref: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-temporary

tiozab commented 10 months ago

@ginberg since you are the maintainer of the DrugExposureDiagnostics Package, can you help me? Maybe we can continue with this issue in the DrugExposureDiagnostics Package? https://github.com/darwin-eu/DrugExposureDiagnostics @edward-burn, what do you think?

I am assuming that the DrugExposureDiagnostic package does not support the "Azure Synapse" Database management system?

tiozab commented 9 months ago

@eric-fey-hus I think same issue as https://github.com/oxford-pharmacoepi/CancerSurvivalWp2Analysis/issues/25 I am sorry but this is out of my hands, so for the time being (because your DBMS is not supported), you will not be able to participate in the megastudy @martapineda

eric-fey-hus commented 9 months ago

@tiozab We are working on the SQL Server version. In parallel, we would also like to get this working with our Synapse backend. For DatabaseConenctror using SQLRender we could solve this by adding a few small substitution patterns in replacementPatters.csv of SQLRender, and we had Synapse supported:

synapse,DROP TABLE IF EXISTS #@table;,"IF OBJECT_ID('tempdb..#@table', 'U') IS NOT NULL DROP TABLE #@table;"
synapse,DROP TABLE IF EXISTS @table;,"IF OBJECT_ID('@table', 'U') IS NOT NULL DROP TABLE @table;"
synapse,CREATE TABLE IF NOT EXISTS @table (@definition);,"IF OBJECT_ID('@table', 'U') IS NULL CREATE TABLE @table (@definition);"

Hoping to do the same here. It would be great if you could point us towards where the root issues arises (what package, function, etc?), then we can look for a solution there.

tiozab commented 9 months ago

@eric-fey-hus the package can be found here https://github.com/darwin-eu/DrugExposureDiagnostics

You said you got the error when running the executeChecks function (however, this function calls a ton of functions ;-))

but I am not sure if that is what you wanted to know?

tiozab commented 8 months ago

@eric-fey-hus congrats on uploading the results!! @martapineda. do you mind sharing with us how you got it to work? @edward-burn