zozlak / RODBCext

14 stars 10 forks source link

Testing `varchar(max)` retrieval #6

Closed wibeasley closed 8 years ago

wibeasley commented 8 years ago

Hi @zozlak (and @r2evans), I'm responding to #4 and #5. I'm happy to help with your request to test. Before I get started, can you confirm that this is what you'd like me to do:

  1. Within SSMS, create a fresh table in a fresh database with at least one varchar(max) column.
  2. Within R, populate that column and send it to SQL Server.
    • I'll write a random characters to each cell. How many characters would you like to me to write? 2^10? 2^16?
    • Which function should I use for this? RODBCext::sqlExecute()?
  3. Within R, retrieve the column and see if it matches R's in-memory copy of the data.frame.
    • Which function should I use for this? RODBC::sqlGetResults()?

Since these tests can be run on Travis (b/c SQL Server isn't yet supported for Ubuntu), I'll put the tests in a different folder than /tests/testthat.

BTW, if you need help in the future, consider asking @nutterb if he has time. He's a sharp guy and a fan of RODBCext.

zozlak commented 8 years ago

Thanks for help!

Yes, I would like you to perform a test as you described. Only a few remarks:

zozlak commented 8 years ago

And if you wonder why varchar(max) will retrieve only 255 characters and varchar(anyNumber), e.g. varchar(100000) will retrieve up to 100k characters, the answer is:

nutterb commented 8 years ago

I had a two week long ordeal with this issue as I was starting to work with RODBCext. When I finally understood that it was an issue in the SQL Server driver ( specifically, I think it is the SQL Server 11.0 Native Client driver) I had to find a way to work around it.

My solution comes in there policies

  1. Never use Varchar (max) in my databases ( though I understand this may not be realistic for existing databases. I'm just fortunate to have control over such matters)
  2. Use varchar (8000) if I really need a long character string. More often than not, this will suffice.
  3. If varchar (8000) isn't enough, use a varbinary (max). This requires some finagling to convert your character string to a binary string, but it isnt terribly difficult to do and SQL Server does handle the varbinary (max) type pretty well.

Again, this doesn't really help with established databases. If you really need to get out a varchar(max), I think you will have to do some ugly contortions with a stored procedure.

I can run these tests on my instance of SQL Server as well. We have s couple of differect drivers installed on our machines that don't always behave the same way).

wibeasley commented 8 years ago

@zozlak, I believe this is what you wanted me to test, and that it confirms your suspicions. Notice I tried with the more recent ODBC driver too: 13.1.

The results can be viewed in this mark down. Tell me if you'd like to me to try something else. https://github.com/wibeasley/RODBCext/blob/master/tests/manual/stitched-output/test-varchar.md

@nutterb, if you're recreating something in the future, notice the .sql script to create the table, in addition to the .R file to run the tests. If you create your own DSN an point it to that test database, I think you should be able to replicate everything. I'll create a PR now, so it's more unified in the master branch.

zozlak commented 8 years ago

The 8000 characters test copying only 7999 characters looks a little suspicious (like I forgot to allocate one more byte in the buffer for the terminating \0 character [C internals]).

I will check it, probably on Sunday.

If I will introduce any changes, I will ask you to run tests again and the final result will be the 0.2.7 release ("a correct support for string data types reporting 0 length")

nutterb commented 8 years ago

Just ran these on my configuration under three drivers. Results were identical to @wibeasley

dsn_11_0 <- "rodbcext-test-11-0" #ODBC Driver 6.01.7601.17514; SQLSRV32.DLL 20-NOV-2010
dsn_odbc_11 <- "rodbcext-test-odbc-for-sql-11" #ODBC Driver 2014.120.4219.00; MSODBCSQL11.DLL 21-DEC-2015
dsn_sql_server <- "rodbcext-test-sql-server" # ODBC Driver 2011.110.3000.00; SQLNCLI11.DLL 20-OCT-2012
zozlak commented 8 years ago

I checked the RODBCext part and it seems that buffers' length is set correctly in RODBCext.

This suggests that the "last character being lost for columns longer then 255 chars" error is caused by the RODBC (the data retrieval step is performed by RODBC's functions).

To be investigated when I will have a few days of free time (#8).

@nutterb , @wibeasley Thank you for your help!