zozlak / RODBCext

14 stars 10 forks source link

Support for Table Valued Parameters #16

Open smacc76 opened 7 years ago

smacc76 commented 7 years ago

We use stored procedures which accept SQL table variables as parameters to load data into database tables. This makes loading large datasets quick and easy and avoids other solutions like xml, bulk inserts or loops.

https://msdn.microsoft.com/en-us/library/bb675163(v=vs.110).aspx

Does RODBCext support table valued parameters at present? If not can this be added so that an R dataframe could be loaded directly into a SQL table in this manner. Thanks.

zozlak commented 7 years ago

RODBCext does not support table-valued parameters at the moment.

Table-valued parameters are Ms SQL-specific. Also stored procedures invalidating a prepared query seems to be Ms SQL-specific. Unfortunately I am not using Ms SQL (nor Windows which makes testing Ms SQL quite troublesome for me) and I am very sceptic about digging into Ms SQL-specific issues. Last but not least it seems you can get the same by putting your data into a temporary table and running your stored procedure on the data in this table.

This means if you want such a feature to be added you need to convince me a little and to contribute.

At the beginning I would ask you to provide a minimum reproducible example - an SQL code of:

(connected with #15)

smacc76 commented 7 years ago

Hi zozlak,

I have attached (inserted - uploader not working) a simple working example of a table valued parameter in MS SQL.

The benefit of this approach is that if we are loading large data sets from R into our database (one user wants to load 600,000 records at a time) that we do one call and load the entire data set in one go. Contrast this with 600,000 individual INSERT INTO calls in a loop. The performance will be much improved using the table valued parameter approach.

It would be great if you could enhance RODBCExt to accept whole data.frames in the table valued parameter slot of a stored procedure.

Let me know if you need more SQL code or convincing.

Thank you.

`--Example of using Table Valued Parameter to bulk load data --instead of individual insert statements

--1. Define the Permanent Table, Test Data Type and Stored Procedure

-- This is our permanent table. CREATE Table TestPhysicalTable ( Number Int,
Name varchar(256)
)

-- This is defining our particular table type variable. CREATE TYPE Test AS TABLE (
Number Int,
Name varchar(256)
)

GO

-- Proc that accepts table data type CREATE PROCEDURE usp_TestTableValueParameter ( @TableVariable Test READONLY )

AS

BEGIN INSERT INTO TestPhysicalTable(Number, Name)

  SELECT Number, Name
  FROM @TableVariable

END

GO

--2. Create some data and pass into the Stored Procedure

-- This would be a data.frame in R DECLARE @DataTable AS Test

INSERT INTO @DataTable VALUES (1,'Tony'), (2,'David'), (3,'Simon'), (4,'Roger')

--Execute the stored procedure to load the data into the permanent table --RODBCExt needs the ability to handle entire data.frames in one go using the table valued parameter approach EXECUTE usp_TestTableValueParameter @TableVariable = @DataTable

--3. View the loaded data SELECT * FROM TestPhysicalTable

--Remove objects --Drop Table TestPhysicalTable --Drop Procedure usp_TestTableValueParameter --Drop Type Test`

zozlak commented 7 years ago

Just one clarification - is your main concern fast data loading (and table valued parameters are a way of achieving it advised to you) or are you interested in the table valued parameters themselves?

In the first case I'll to try to utilize the SQL_ATTR_PARAMSET_SIZE statement attribute. It should make INPUT and UPDATE queries slightly faster for drivers which support it (Ms SQL Server drivers do it for sure). This is because the performance gain is about reducing number of ODBC calls and not about using table valued parameters themselves (on the ODBC level you can still provide data row by row using table valued parameters which will provide you with negligible performance gain).

Providing support for the table valued parameters themselves looks quite troublesome. The whole parameters' binding looks slightly different for them (by the way this example has more C lines that the whole RODBCext C code :-) ) and I do not think I'll motivate myself in predictable future to implement such a vendor-specific curiosity (please do not take it personally, I understand that for the Ms SQL Server users it might be an everyday bread but I am referring to the ODBC here which is much broader then the MS SQL Server). Anyway RODBCext is an open source project so if you want to try to extend it with a table valued parameters support, it'll be highly appreciated.

MineAkgl commented 2 years ago

Is there any change of plans on this issue?

zozlak commented 2 years ago

I'm afraid there are no chances at all - see https://github.com/zozlak/RODBCext#deprecation-info.

wibeasley commented 2 years ago

To build on @zozlak's response, the odbc package supports this.

Here's a public example with RSQLite, but I've done the same with odbc.

https://github.com/National-COVID-Cohort-Collaborative/CS-Rural-Health/blob/9c3ad4103be672bb446012a196662b18e7ddea47/steroid-background/manipulation/concept-set-augment.R#L104

sql_retrieve <-
  "
    SELECT * FROM  concept c
    WHERE c.concept_id = ?
  "

...

# Pull info from OMOP's concept table
cnn <- DBI::dbConnect(drv = RSQLite::SQLite(), dbname = config$path_database)
ds_omop  <- DBI::dbGetQuery(cnn, sql_retrieve, params = list(ds_input$concept_id))
DBI::dbDisconnect(cnn); rm(cnn)