Closed mach-kernel closed 6 months ago
Hello @mach-kernel ,
Yeah, this is IMHO a Bug in the Simba drivers. They should report an upper bound. However I can roughly understand why they implemented it this way. Many Software packages can not deal with really large buffers. However, it is even worse to my knowledge (indirectly deduced of logs in Bug reports submitted to me) the Simba drivers do not even support the trunaction correctly. This can lead to silent data loss. Not the best option.
However, if I remember correctly there may be an option which could be passed in the connection string which allows you to replace the default reported upper buffer size of 255
with a value of your choosing.
Best, Markus
This PR, I have to reject. For ODBC drivers which are implemented correctly the user needs the ability to define an upper bound, not a lower bound. So reversing the meaning as done here, would break a lot of code.
I am open to give the user a fine grained control, over buffer size for individual columns.
Markus,
Thank you very much for pointing us in the right direction! If anybody else runs into this, the parameters are:
DefaultStringColumnLength
BinaryColumnLength
Hello @mach-kernel ,
you are welcome. Thanks for sharing your findings!
Hello! Thank you for these crates, it was really easy to get up and running!!
I am trying to read an ODBC data source (Simba/Databricks) from a columnar store that has a particularly large column (a file, potentially several megabytes). Unfortunately, something is wrong with the result set metadata, where the reported result set column size is too small for the values returned. This also happens with large text columns (e.g. base64 representation of bytes):
Bin case:
Text case: result set metadata is e.g.
Varchar(255)
for 10k+ of text.EDIT: Bummer, looks like the driver provided metadata is incorrect:
Query
select raw_email from messages
Is it correct to assume the metadata is coming from
sql_describe_col
and by extension from the driver?I understand the semantics as written are to choose an upper bound (which this undoes for large results with accurate size), but wanted to ask if there could be an affordance made for choosing the larger one of the two (maybe a
Quirk
like "result_set_alloc_max_sizes")? With this change I am able to read result batches fine. But in the case that the buffer is too small, we cannot proceed. I will try to assemble some test case data, I figure this is enough to start the conversation 😸.