snowflakedb / snowflake-connector-nodejs

NodeJS driver
Apache License 2.0
124 stars 132 forks source link

SNOW-1703685: High Memory Usage during PUT Query execution for Large GZIP compressed CSV files #922

Open kartikgupta2607 opened 1 month ago

kartikgupta2607 commented 1 month ago

Please answer these questions before submitting your issue. In order to accurately debug the issue this information is required. Thanks!

  1. What version of NodeJS driver are you using? 1.9.3

  2. What operating system and processor architecture are you using? MacOS arm64

  3. What version of NodeJS are you using? (node --version and npm --version) node : 18.12.1 , npm: 8.19.2

  4. What are the component versions in the environment (npm list)? NA

  5. Server version: 8.9.1

  6. What did you do?

    Issue Summary

    While executing a PUT query to stage a large, compressed CSV file from the local file system to a Snowflake stage (S3), the memory usage of the snowflake-sdk grows significantly, especially with large files. During the execution, the Snowflake SDK performs several operations:

  7. Compression (if the file is not already compressed),

  8. SHA-256 Digest Calculation,

  9. AES Encryption,

  10. Upload to S3 (or other remote storage).

While these steps are necessary, the SDK's memory footprint grows significantly based on the file size, which appears to be due to the following reasons:

Steps to Reproduce:

While executing the query, monitor memory usage using tools like: Node.js process memory logging, clinic doctor or any external memory profiling tool.

  1. What did you expect to see?

    • Ideally, the SDK should minimise memory consumption by using a streaming approach for both the digest calculation and file upload steps. This would help in handling large files more efficiently.
  2. Can you set logging to DEBUG and collect the logs? No

  3. What is your Snowflake account identifier, if any? (Optional)

sfc-gh-dszmolka commented 1 month ago

thank you for raising this enhancement request with us, we'll consider it for the future roadmap (with no timeline commitment) really appreciate the details you provided and the suggestions!