snowflakedb / snowflake-connector-nodejs

NodeJS driver
Apache License 2.0
121 stars 125 forks source link

Errors in temp stage are not propagated to caller #417

Closed haggholm closed 1 year ago

haggholm commented 1 year ago

Please answer these questions before submitting your issue. Thanks!

1. What version of NodeJS are you using (node --version and npm --version)?

$ node --version
v16.18.1
$ npm --version
8.16.0

2. What operating system and processor architecture are you using?

No LSB modules are available.
Distributor ID: Pop
Description:    Pop!_OS 22.04 LTS
Release:    22.04
Codename:   jammy

3. What are the component versions in the environment (npm list)?

(Not relevant.)

4. What did you do?

It should be easy to reproduce this by setting up a scenario where (a) the bind threshold is set lower than the number of parameters and (b) making sure not to provide a default schema.

6. What did you expect to see?

An error passed to the complete handler passed to connection.execute().

7. What did you see instead?

An uncaught exception.

I can trace the execution as far as https://github.com/snowflakedb/snowflake-connector-nodejs/blob/37318ed0afee2fe3c6a4b648d2a03403ec4a8c61/lib/connection/statement.js#L121, but after that it's not entirely clear where the error is emitted.

I don't know whether this is related, but I was also surprised to find that connection.execute() doesn't return a statement object when it executes in bind mode. It's rather confusing to work with the API when the return type depends somewhat obscurely on input size. https://github.com/snowflakedb/snowflake-connector-nodejs/blob/37318ed0afee2fe3c6a4b648d2a03403ec4a8c61/lib/connection/statement.js#L110

8. Add this to get standard output.

var snowflake = require('./snowflake');
snowflake.configure(
{
  logLevel: 'trace'
});
sfc-gh-dszmolka commented 1 year ago

hi, thank you for submitting this issue and providing the details! indeed this can be easily reproduced the way you mentioned. we'll take a look. in the meantime as a workaround, you can specify a dummy (but existing..) database + schema in the connection context to handle this shortcoming.

or alternatively, you can specify the arrayBindingThreshold parameter when creating the connection. this param is available from v1.6.17 and up, default value is 100000 and governs the threshold which the number of binds in the array binding must reach for the driver to take the 'bind upload' path. until the number of binds stay below this threshold, the driver doesn't take the aforementioned path and thus won't try to create the temporal stage to copy the files first before the insert, thus won't trip up.

hope some of the above helps until we're debugging.

haggholm commented 1 year ago

hope some of the above helps until we're debugging.

Thank you. We ended up going the route of making sure never to hit the threshold by batching the operations (it's pretty rare to need >100k parameters, after all).

It turned out that even when providing a schema, the array bind functionality did not work for us; we have a very simple query that looks like

INSERT INTO ${table}
(id, ${column1}, ${column2})
VALUES (:1, :2, :3)

into a table with three text columns.

Initially this failed to create the temp stage due to the lack of a default schema, which is how we discovered the error leak; but when providing a schema, it still failed with an error that looked like

Found character 'S' instead of record delimiter '\\n'\n  File '20b4281d-2076-4f0c-a7d2-57487c421915/1', line 2, character 73\n  Row 1 starts at line 1, column \"FOO60_BAR147_BAZ\"[\"$3\":3]

and I have no idea why. I should probably open a separate issue? I'm just not sure I understand it well enough to provide useful details, but if that suggests something in the course of investigating this issue, that would of course be grand…

For now, we're just treating the array threshold as a cap to the number of parameters.

sfc-gh-dszmolka commented 1 year ago

at this point, i'm also unsure if the second issue is related to the original one. generally and functionally speaking, the array bind feature is working. there are some issues already discovered around it, but i don't remember this one so i think it would be a good idea to take a look.

would it be possible to open a new Issue for this please, to keep things consistent ? it would be also extremely helpful if you could please also provide a code snippet to reproduce (the Found character 'S' instead of record delimiter '\\n', that is). especially providing the bound data itself might also help in the investigation (or a mock data which represents the actual data); i'd like to rule it out if the file format used in the bind_uploader's temporary stage plays a role or not.

thank you in advance !

haggholm commented 1 year ago

Took a stab at it: https://github.com/snowflakedb/snowflake-connector-nodejs/issues/420

I'm not sure if that's good enough for you to reproduce it, but it's the best I can do at the moment.

sfc-gh-dszmolka commented 1 year ago

https://github.com/snowflakedb/snowflake-connector-nodejs/pull/466

sfc-gh-dszmolka commented 1 year ago

fix released with 1.6.21