Open krocodl opened 1 year ago
BTW, there is an error in the documentation: CLIENT_MEMORY_LIMIT is not global maximum memory usage limit for all queries. It is the property of the SnowflakeChunkDownloader and each SnowflakeResultSetSerializableV1 (each query under execution) has one own instance of this class.
@sfc-gh-wfateem could you please triage this issue?
hi all - unfortunately this is a limitation across all of the Snowflake drivers and likely will need server-side changes too. The relevant (multiple) teams are aware, and scoping and prioritizing this issue is in progress. No timeline can be promised at this point though, unfortunately.
However if you're perhaps already a Snowflake customer, do reach out to your Account Team please and phrase how important this change would be for you to get. This might give the efforts some traction and help the involved teams prioritize. Thank you in advance!
Driver net.snowflake:snowflake-jdbc:3.14.2
First of all we need to prepare the table with 10_000_000 rows. we can do it by the next SQL:
create or replace table long_table as select row_number() over (order by null) as id from table (generator(rowcount =>
10000000)) `After that we can implement the test, which tries to download a lot of simulated data with minimal usage of memory:
` @Test public void bigDataTest() throws Exception { int colSize = 1024; int colCount = 12;
The count of chunks, which will be downloaded, we can observe in the variable SFStatement#executeQueryInternal#result. The value of this variable is JSON document, which contains the list of reference on the chunk's URLs as "data.chunks" path. The size of this list is interesting for us - the larger the list, the smaller the size of one chunk and the less memory is used in the process of retrieving data. The following data is obtained with different values of ps.setMaxRows():
ps.setMaxRows(Integer.MAX_VALUE); => data.chunks.size= 2453
At the same time, we see that with ps.setMaxRows(Integer.MAX_VALUE) and ps.setMaxRows(-1); the value of the CLIENT_RESULT_CHUNK_SIZE configuration parameter is not taken into account at all when calculating the number of chunks.
The following graphs illustrate the actual memory consumption in two fundamentally different cases (data.chunks.size=2423 and data.chunks.size= 469)
In the second case from time to time we receive OOM due to the process of re-downloading chunks. This is another problem - in the process of re-requesting an incorrectly received chunk, this chunk is still held in memory, so the actual consumption will be CLIENT_RESULT_CHUNK_SIZE 3 instead of the values stated for the given values of the variables CLIENT_RESULT_CHUNK_SIZE 2