snowflakedb / snowflake-connector-nodejs

NodeJS driver
Apache License 2.0
120 stars 124 forks source link

Values sent via temp stage are not read correctly #420

Closed haggholm closed 1 year ago

haggholm commented 1 year ago

1. What version of NodeJS are you using (node --version and npm --version)?

$ node --version
v16.18.1
$ npm --version
8.16.0

2. What operating system and processor architecture are you using?

No LSB modules are available.
Distributor ID: Pop
Description:    Pop!_OS 22.04 LTS
Release:    22.04
Codename:   jammy

3. What are the component versions in the environment (npm list)?

(Not relevant.)

4. What did you do?

(Spin-off from https://github.com/snowflakedb/snowflake-connector-nodejs/issues/417)

Our code sends data with potentially large numbers of parameters to Snowflake. In some cases, this triggers the array bind feature where the driver will create temporary stage.

The relevant code, lightly edited, is

const query = `
   INSERT INTO some_table
   (id, ${columnName1}, ${columnName2})
   VALUES (:1, :2, :3)`;

await new Promise<void>((resolve, reject) => {
  return client.execute({
    sqlText: query,
    binds: rows.map((row) => [row.id, opType, row.reason]),
    complete: (err) => (err ? reject(err) : resolve()),
  });
});

some_table has three columns of type TEXT. Each row in rows is of the form (hex_string, "added" | "removed" | "changed", descriptive_string), e.g. something like

[
  "108e6e99ee84f1cd9dadff301b8e518b598b01789eba957540088b11c21a7bc6",
  "added",
  "Some row in this batch failed with error: error: invalid input syntax for integer: \"7qfhNsLEmxnlddXp9RXKXVOd0h4\""
]

6. What did you expect to see?

No output; this should just insert data.

7. What did you see instead?

An error from Snowflake.

Found character 'S' instead of record delimiter '\n'
File '76f8c268-30ac-4bb9-8904-efa13344f631/1', line 2, character 73
Row 1 starts at line 1, column "SOME_TABLE"["$3":3]

I don't really know what this is about. It looks to me like it's creating a CSV; maybe, given that the input contains quotes ", the library is not escaping input properly? But I don't really know, and I don't know whether this suffices to reproduce the issue.

In one instance, I instead got an error saying it failed to decrypt something, but I didn't think to capture the error message because I figured it was reproducible, so I thoughtlessly cleared my output window. That was a mistake—I can't seem to reproduce that at all, so I can only make an anecdotal mention of that fact.

8. Add this to get standard output.

var snowflake = require('./snowflake');
snowflake.configure(
{
  logLevel: 'trace'
});

These instructions seem a bit off; require('./snowflake') isn't going to work. With require("snowflake-sdk").configure({ logLevel: "trace" }), I get this:

scratch_5.txt

sfc-gh-dszmolka commented 1 year ago

hi, thank you for submitting this issue and providing all the details. i could reproduce the same issue thanks to the input data you provided, and during the triage i found that it is probably caused by the escaped double quotes (\"), or how the driver treats them. i found that my array-bound input

["e013672f5420d0f1b5c28b7a9873a4c48633b5c82eac9a5e6539a3b08eb45fb3","remove","ah chop wicks ox web row: \"aZgTSKUoXxfHQprbCkGn\""],
["a33423c5ccc1e89354737a5079101755781d58cdb5aad8bf992530cd19e963fe","change","id live for ms gab sear: \"EYQRVFWZuMGUgALIStKN\""]

(mock data but the structure more or less matches yours) was transformed to

e013672f5420d0f1b5c28b7a9873a4c48633b5c82eac9a5e6539a3b08eb45fb3,remove,"ah chop wicks ox web row: ""aZgTSKUoXxfHQprbCkGn""
a33423c5ccc1e89354737a5079101755781d58cdb5aad8bf992530cd19e963fe,change,"id live for ms gab sear: ""EYQRVFWZuMGUgALIStKN""

by the driver when it wrote to the temp file before ingestion, which then trips up the Snowflake file_format which is autocreated by the driver for the temp stage (which hosts the temp file for a moment, until it is ingested), as suspected in Issue 417:

CREATE_STAGE_STMT = CREATE OR REPLACE TEMPORARY STAGE SYSTEM$BIND file_format=( type=csv field_optionally_enclosed_by='\"')"}

all in all, we'll take a look. how to work around this issue; two immediate things popped up into my mind:

For the other 2 issues

was very likely Failed to decrypt. Check file key and master key. and is indeed an intermittent and known issue with the bind_uploader (driver versions 1.6.14+), very likely happening when reading/writing to the temporal stage created during the ingest. Under investigation on #374 . Workaround for it also would be to increase the arrayBindingThreshold to a higher value to avoid going down the 'create temporal stage + PUT file before inserting' path.

sfc-gh-dszmolka commented 1 year ago

should be fixed with #436 tested the latest code which has PR436, and the original reproduction which failed (and still fails with latest release snowflake-sdk@1.6.19) now works with the latest code

the next upcoming release will contain this fix

sfc-gh-dszmolka commented 1 year ago

release snowflake-sdk@1.6.20 is out with the fix PR. please upgrade to the fixed version and re-test. if the issue still persists, please reopen the issue or comment with the details and I'll reopen it.

haggholm commented 1 year ago

Unfortunately I don't think we can use parameter binding due to one of the issues I mentioned in https://github.com/snowflakedb/snowflake-connector-nodejs/issues/417, specifically the fact that when binding parameters, client.execute() does not return a Statement object. Unless that has also been fixed?

sfc-gh-dszmolka commented 1 year ago

no, unfortunately #417 is not fixed yet. in fact, I maybe missed this part in its error description. Until very recently I was under the impression that #417 focuses on the issue which is generated by not specifying a database in the connection context, then (depending on the input size) the code might take the 'bind upload' path, and in this occasion (due to the missing database context) the temp stage cannot be created prior to the insert, therefore the node process crashes, instead of nicely propagating an error to the caller.

if you found that the driver not returning Statement object from client.execute() executed in 'bind upload' mode (== the crash scenario) would it be possible to open a new Issue for that one? Thank you in advance !

haggholm commented 1 year ago

I did seem to smash together a collection of issues in #417—sorry about that. A more isolated report of this issue: https://github.com/snowflakedb/snowflake-connector-nodejs/issues/455

sfc-gh-dszmolka commented 1 year ago

thank you so much for this! we'll take a look