php / php-src

The PHP Interpreter
https://www.php.net
Other
37.93k stars 7.73k forks source link

Pdo\pgSql does not fetch() efficiently #15287

Open outtersg opened 1 month ago

outtersg commented 1 month ago

Description

It would be expected that:

<?php
$statement = $db->query("select * from that_table");
while(($row = $statement->fetch()))
    fputcsv(…, $row);

be an efficient way to dump a table with PHP being a simple no-store pass-through.

But in fact fetch is not lazy as one could hope: the whole resultset is loaded in the query() before the script gets a chance to read the first rows.

E.g.: on a 24 million entries table of ~ 35 bytes per row, php consumes 2,5 GB of RAM during the whole process (that is, 3 bytes per effective data byte. It could have been worse).

In addition to this memory consumption, this prevents maximizing throughput, as we first have to wait the whole network part to finish, then only we do the whole disk I/O.

Cursors?

Here and there I see suggestions to use (server-side) cursors. But:

Client laziness implementation?

libpq official docs says that PQexec is too high-level for that, and directs to PQsetSingleRowMode after a PQsendQuery.

As laziness modifies the script's responsibilities (error handling is probably different), this should not be the default. But maybe PDO::ATTR_PREFETCH, used as a boolean (1 being the default to continue using PQexec, so an explicit set to 0 would be required to use PQsetSingleRowMode), could be used to this effect.

As for the code structure: the cursors implementation is a great starting point, because its workflow is very similar (necessity to handle interrupts after partial results, to close a previous unclosed cursor…).

PHP Version

PHP 8.3

Operating System

No response

devnexen commented 1 month ago

That is closer to a feature request to me, wdyt @SakiTakamachi ?

outtersg commented 1 month ago

@devnexen wrote:

That is closer to a feature request to me, wdyt @SakiTakamachi ?

I hesitated between both when creating it, and now that you say it and I reread my description, it makes more and more sense.

The memory hog for which I opened the issue, which is not documented, and the numerous workarounds found everywhere on the web, give credit to its 'bug' status. But as I said in my last paragraph of the Description, prudence on not breaking BC (and perhaps minor performance hit) would dictate an opt-in attribute instead of a transparent solution: in that case yes declaring it a feature makes sense (with at least a note on fetch()'s doc that for some drivers, counter-intuitively, fetch() does not lazily load results, and that a driver-specific option should be used to that goal).

devnexen commented 1 month ago

it is fine :-) the line is often blurry in this kind of case in general.

damianwadley commented 1 month ago

Sounds like you're using buffered queries and what you want is to not do that...

outtersg commented 2 weeks ago

@damianwadley: exactly. Now the pointers you gave are all related to MySql; the goal of this PostgreSQL-centered ticket would be to make Pdo\PgSql feature-wise on par with Pdo\MySql.

… And ideally it would be interface-wise too, but I see that MySql chose MYSQL_ATTR_USE_BUFFERED_QUERY which makes it MySql-specific 😞. There exists the generically named ATTR_PREFETCH attribute, but only Pdo\Odbc handles it.

outtersg commented 6 days ago

I'm attempting an implementation at PR #15750.