zozlak / RODBCext

14 stars 10 forks source link

Feature Request: Add functions to get and set SQL connection and statement attributes (e. g. timeout values) #10

Closed aryoda closed 7 years ago

aryoda commented 7 years ago

Summary

To be able e. g. to

  1. set the connection timeout in seconds
  2. set the statement (sql command) timeout in seconds

some generic functions should be added to get and set ODBC attributes.

Details

The new functions should be added to the RODBC package normally.

But:

Instead of starting a new package the RODBCext package is a good place to implement this feature request.

RODBCext uses RODBC connection handles, so code working with RODBC should also simply work with RODBCext.

Implementation hints

The implementation should be quite straightforward (similar to "SEXP RODBCSetAutoCommit(SEXP chan, SEXP autoCommit)" function in the file "RODBC.c".

To change the timeout values the ODBC API offers two functions:

  1. SQLSetConnectAttr()

https://msdn.microsoft.com/en-us/library/ms713605(v=vs.85).aspx

Attribute to set SQL_ATTR_CONNECTION_TIMEOUT

An SQLUINTEGER value corresponding to the number of seconds to wait for any request on the connection to complete before returning to the application. The driver should return SQLSTATE HYT00 (Timeout expired) anytime that it is possible to time out in a situation not associated with query execution or login.

  1. SQLSetStmtAttr()

https://msdn.microsoft.com/en-us/library/ms712631(v=vs.85).aspx

Attribute to set: SQL_ATTR_QUERY_TIMEOUT (ODBC 1.0)

An SQLULEN value corresponding to the number of seconds to wait for an SQL statement to execute before returning to the application. If ValuePtr is equal to 0 (default), there is no timeout. If the specified timeout exceeds the maximum timeout in the data source or is smaller than the minimum timeout, SQLSetStmtAttr substitutes that value and returns SQLSTATE 01S02 (Option value changed). Note that the application need not call SQLCloseCursor to reuse the statement if a SELECT statement timed out. The query timeout set in this statement attribute is valid in both synchronous and asynchronous modes.

aryoda commented 7 years ago

@zozlak As discussed I will try to implement and contribute this myself.

Therefore I will make a fork and submit a pull request when I am ready.

I hope to have time for that soon (I think within the next months)...

aryoda commented 7 years ago

Status update on this feature request:

I have decided to implement only the query timeout feature in RODBCext since it does not require to change RODBC.

I will not implement the connection timeout feature in RODBCext (and probably not at all since it is not that important) since it would require to change RODBCs odbcConnect and odbcDriverConnect functions to add a new connection.timeout parameter.

Since I did not get any response from the `RODBC' package maintainer I won't waste time on that.

You can find the current state of implementation in my forked RODBCext repo in a separate feature branch: https://github.com/aryoda/RODBCext/tree/fb-stmt-timeout

I will send a pull request after having finished my tests.

zozlak commented 7 years ago

I took a look at your implementation and on the ODBC specification and I would prefer to implement it in a more generic way by:

This will allow to support other attributes in the future without adding more and more functions/parameters.

With the attribute names I propose to use (case insensitive) ODBC attribute names with skipped "SQLATTR" prefix.

And what do you think?

aryoda commented 7 years ago

Regarding sqlPrepare() and sqlExecute():

I would prefer the ... argument in the function signatures Instead of a list if all attributes shall be supported in the future, so that a function call looks even more natural, e. g.:

sqlPrepare(con, "query string", query_timeout = 10, attr_x = 1, attr_y = 2) # and so on

From a usability point-of-view I dislike function signatures that do not support me with "intellisense" hints about the allowed parameters (like ... or the attr = list() options), that is why I have added a "hard coded" parameter.

So we have currently three options:

  1. Add every newly supported ODBC attribute as a normal parameter at the end (as I currently did)
  2. Use only one parameter which is a list of key/value pairs (as you proposed)
  3. Use the ... parameter for an open ended number of arguments in a natural function call syntax

I am open for every one of this solutions, but would prefer 1 for usability reasons.

Your choice now... :-)

zozlak commented 7 years ago

Passing attributes as a list have two advantages for me:

To provide code completion we can create an additional function odbcAttr() with all supported attributes listed as its parameters.

So the code sample code will look like

sqlPrepare(con, "query string", odbcAttr(query_timeout = 10, attr_x = 1, attr_y = 2))

aryoda commented 7 years ago

Regarding a generic odbcGet/SetAttr():

The corresponding ODBC functions are heavily "overloaded" (verifying correct actual parameter values seems complicated at [my] first glance) so that I am afraid of crashes (NPE...) if I do not carefully check all allowed dependencies.

And also here a generic function signature renders intellisense auto-completion of code in IDEs like RStudio less helpfull (to choose from the allowed attribute names).

Please give me some time to investigate the SQLSetStmtAttr() ODBC-API function and their attributes to get a more solid opinion about the feasibility of a generic function...

I also have to consider the problem that different ODBC drivers can add arbitrary DB-specific attributes names.

With the attribute names I propose to use (case insensitive) ODBC attribute names with skipped "SQLATTR" prefix.

Agreed!

aryoda commented 7 years ago

To provide code completion we can create an additional function odbcAttr() with all supported attributes listed as its parameters. sqlPrepare(con, "query string", odbcAttr(query_timeout = 10, attr_x = 1, attr_y = 2))

Good idea, this is option number 4!

From a user perspective I think the complexity is still higher (the allowed list of parameter or list element names has only moved from sqlPrepare() to odbcAttr().

I am really missing enumerations in R for a list of allowed values/names (would be option number 5 :-), e. g. via an environment like

StmtAttr <- new.env()
StmtAttr$query_timeout <- "QUERY_TIMEOUT"
...
sqlPrepare(con, "query string", StmtAttr.query_timeout = 10)

But I see no advantage over your proposed "odbcAttr" function.

zozlak commented 7 years ago

From a user perspective I think the complexity is still higher (the allowed list of parameter or list element names has only moved from sqlPrepare() to odbcAttr().

If you need to explicitly set statement attributes, then your use case is (more or less) advanced and it does not make much difference but if you are a beginner trying to run your first query it is probably easier to see in the documentation only one parameter "attr - (advanced) statement attributes - see odbcAttr()" then the list of all supported attributes you do not need to care about at the moment.

Another drawback of using many parameters is making code maintenance harder - when you want to add support for another you need to modify many functions and with the odbcAttr() approach you will always take care only about odbcAttr() and odbcSetAttr().

zozlak commented 7 years ago

verifying correct actual parameter values seems complicated at [my] first glance so that I am afraid of crashes (NPE...) if I do not carefully check all allowed dependencies.

I completely agree that odbcSetAttr() should support only limited set of attributes (here and now only the timeout) because they need to be properly checked (and also there is no nice way to map attribute name given as string to the corresponding numeric value in the odbc.h). By proposing "a generic odbcSetAttr()" I mean "a function easy to extend" not "a funciton able to deal with any ODBC statement attribute".

aryoda commented 7 years ago

OK, thanks for your feed back, I really understand your arguments!

Please let me do my "homework" and

  1. classify all ODBC statement attributes that make sense to be covered by statement related functions in RODBCext now or later
  2. collect statement related function signatures of other APIs (JDBC, .Net etc.) to compare against those parameters.

I will come back then and write a short summary here as a base for a decision.

Give me a one or two weeks time for that.

zozlak commented 7 years ago

No problem, take your time and many thanks for so much involvement!

aryoda commented 7 years ago

I have classified the ODBC statement attributes now and also collected major statement related function signatures of other APIs:

https://github.com/aryoda/RODBCext/blob/documentation/internal_doc/ODBC%20SqlSetStmtAttr%20-%20attribute%20list.ods

Question 1: How many ODBC attributes could be added at most to the statement related functions of RODBCext as parameters:

Result:

  1. query timeout (already implemented in my pull request)
  2. Parse escaped symbols in SQL strings (good candidate for the future)
  3. Max returned length of character or binary column (should better be a global configuration IMHO)
  4. Max number of rows to return (should be better a global configuration or solved via SQL syntax)

IMHO I would expect only two arguments: "query timeout" and "parse escaped symbols", but at most there could be four additional arguments in the end.

Question 2: Other DB APIs:

Result: Statement related arguments are

  1. connection handle
  2. SQL query string
  3. timeout value
  4. number of rows to fetch
  5. transaction handle
  6. Encryption settings
  7. Max field size (but only set as attribute of the statement class, not as argument of the class operation)
  8. Escape processing (= parse escaped symbols)
  9. Concurrency (read only, updatable...)
  10. unclassified driver specific arguments.

1 and 2 already exist, 3 is my pull request, 8 could be an argument in the future, 4 and 7 should be configured globally, 5 and 9 are unclear to me (how could this be done in ODBC?), 6 and 10 could be solved with a ... parameter (driver specific arguments).

I therefore vote for adding the query timeout as normal parameter (and the "parse escaped symbols" as another argument in the future if requested by somebody) to keep the API simple and straight forward.

@zozlak Please make up your own opinion and decide what I shall do. You are the package maintainer and I will follow your design decision by adjusting my pull request accordingly.

zozlak commented 7 years ago

Ok, lets make it a normal parameter then.

I will test your pull request, merge it and send the new package version to the CRAN. All in all it can take a few weeks.

Please leave the issue opened at the moment, I will close it with a commit merging your pull request.

zozlak commented 7 years ago

RODBCext 0.3.0 is on CRAN.

Many thanks for your involvement @aryoda !