snowflakedb / pdo_snowflake

PHP PDO driver for snowflake
Apache License 2.0
59 stars 30 forks source link

Large memory usage (support for unbuffered queries) #318

Open goetas opened 1 year ago

goetas commented 1 year ago
  1. What version of PDO/PHP driver are you using? PHP 8.2, Snowflake pdo 1.2.6

  2. What operating system and processor architecture are you using? Debian buster

  3. What version of C/C++ compiler are you using? gcc version 8.3.0 (Debian 8.3.0-6)

  4. What did you do?

    $pdo = new PDO(
        'snowflake:account=XXX',
        'XXX',
        'XXX'
    );

    $query = 'select 1 from table (generator(rowcount => 2000000))';
    $r = $pdo->query($query);

    foreach ($r as $row) {
      // no op
    }
    var_dump(memory_get_peak_usage()/1024/1024); // ~8MB
  1. What did you expect to see?

    PHP as expected reported a very low memory usage (~8mb) but the total memory used by the process peaked at 2gb (reported by top as VIRT).

My guess that the memory is used (and not released?) for the data buffering/transfer by the php extension.

  1. Can you set logging to DEBUG and collect the logs?

    https://community.snowflake.com/s/article/How-to-generate-log-file-on-Snowflake-connectors

  2. What is your Snowflake account identifier, if any? (Optional) gqa33272

goetas commented 1 year ago

ok, seems a PDO thing... https://www.php.net/manual/en/mysqlinfo.concepts.buffering.php

is there a chance to offer such feature for snowflake PDO.... since it mainly contains large datasets

AV-GregMayes commented 1 year ago

Also running into this issue. With MySQL you can do $pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false); to make it use unbuffered queries. Would be great to have an equivalent in Snowflake, if it doesn't already eixst.

goetas commented 1 year ago

With postgres , it is possible to use FETCH FORWARD 5 FROM cursor_name;... but in snowflake FETCH can not be used outside of a stored procedure (and it fetches always one row)

sfc-gh-dszmolka commented 6 months ago

hi - apologies for leaving this unattended for so long, we're changing that going forward.

we'll consider this enhancement request and i'll keep this thread posted on the progress, if any. In the meantime, you can consider using LIMIT .. OFFSET .. as a workaround and 'paginate' through very large query results in multiple iterations to lower memory usage.

goetas commented 6 months ago

yea, that is what we are doing, but is a query takes a large time to complete, running it multiple times with limit offset makes it worse