snowflakedb / snowflake-connector-net

Snowflake Connector for .NET
Apache License 2.0
180 stars 138 forks source link

SNOW-1786191: SNOW-1747415 High Memory Usage `SFReusableChunk` #1055

Open hadighattas opened 3 weeks ago

hadighattas commented 3 weeks ago
  1. What version of .NET driver are you using? 4.1.0 also tried to build and use latest version of master including fixes mentioned in https://github.com/snowflakedb/snowflake-connector-net/issues/1004

  2. What operating system and processor architecture are you using? macOS 14.7 ARM

  3. What version of .NET framework are you using? .net standard 2.0

  4. What did you do?

Pulling a lot of data is memory intensive. We tried pulling 100M rows and the memory usage is averaging around ~800-900MB for a unit test, forcing garbage collection does not change anything. This test is using the dapper unbuffered API which fully supports streaming. The memory profiler is indicating that almost all of the objects allocated are in SFReusableChunk BlockResultData.

Profiler screenshots

image image image image

Reproducing this issue is pretty straightforward, we pulled 100M records using this query of sample data

SELECT * from snowflake_sample_data.TPCDS_SF100TCL.inventory
  1. What did you expect to see?

Memory buffer sizes (chunks) should be more conservative/configurable.

  1. Can you set logging to DEBUG and collect the logs?

Reproducing this is pretty straightforward. I can do it necessary.

sfc-gh-dszmolka commented 3 weeks ago

hi - this issue looks awfully similar to the ones we have in the

because the issue is coming from the backend; currently the chunks sizes are not really configurable and if you're retrieving a huge amount of data, then it will be memory intensive (or crash with OOM, depending how limited the memory is)

You can give setting CLIENT_RESULT_CHUNK_SIZE=16 a shot [reference(https://docs.snowflake.com/en/sql-reference/parameters#client-result-chunk-size)] but I can imagine it won't change the situation very much.

What you can use as a workaround while the backend issue is sorted out, and should be working, is to use the LIMIT ... OFFSET ... argument to your select, to 'paginate' through the huge resultset, if you need to bring down the memory usage under a certain level.