loophp / collection

A (memory) friendly, easy, lazy and modular collection class.
https://loophp-collection.rtfd.io/
MIT License
721 stars 35 forks source link

Weird interplay between Collection and PDO result set #310

Closed dmjohnsson23 closed 1 year ago

dmjohnsson23 commented 1 year ago

I've come across an issue in which the get and cache methods don't seem to be behaving correctly when used with a PDO result set, as in the following code:

<?php
use \loophp\collection\Collection;
$fromDate = '2023-09-15 00:00:00';
$toDate = '2023-09-30 23:59:59';
$query = "SELECT *
          FROM communication
          WHERE `date` BETWEEN :fromDate AND :toDate";
$statement = $pdo->prepare($query);
$statement->bindValue('fromDate', $fromDate, PDO::PARAM_STR);
$statement->bindValue('toDate', $toDate, PDO::PARAM_STR);
$statement->execute();
$statement->setFetchMode(PDO::FETCH_ASSOC);
echo $statement->rowCount();
$collection = Collection::fromIterable($statement)->cache();
$collection->get(0);
$collection->get(0);

foreach ($collection as $i=> $r){
    var_dump($i, $r);
}

This produces the following output (formatted, and with some fields removed for brevity):

3

int 0
array (size=8)
  'id' => int 26580
  'user_id' => int 45
  'type' => string 'phone' (length=5)
  'remarks' => string '<p>test test</p>' (length=16)
  'date' => string '2023-09-26 09:04:00' (length=19)

The executed query returns three results, however, when I iterate over the collection, I get a single result. If I comment out both of the $collection->get(0);, however, I get the expect results:

3

int 0
array (size=8)
  'id' => int 26580
  'user_id' => int 45
  'type' => string 'phone' (length=5)
  'remarks' => string '<p>test test</p>' (length=16)
  'date' => string '2023-09-26 09:04:00' (length=19)

int 1
array (size=8)
  'id' => int 26581
  'user_id' => int 45
  'type' => string 'fax' (length=3)
  'remarks' => string '<p>Flagged it</p>' (length=17)
  'date' => string '2023-09-26 09:07:00' (length=19)

int 2
array (size=8)
  'id' => int 26582
  'user_id' => int 45
  'type' => string 'home_visit' (length=10)
  'remarks' => string '<p>ffdfhfhd</p>' (length=15)
  'date' => string '2023-09-26 09:07:00' (length=19)

At first, it made me wonder if get was intended to remove items from the collection. However, I also get my expected results (all three rows) with either of the two following changes to my code: (using squash or using fetchAll)

<?php
use \loophp\collection\Collection;
$fromDate = '2023-09-15 00:00:00';
$toDate = '2023-09-30 23:59:59';
$query = "SELECT *
          FROM communication
          WHERE `date` BETWEEN :fromDate AND :toDate";
$statement = $pdo->prepare($query);
$statement->bindValue('fromDate', $fromDate, PDO::PARAM_STR);
$statement->bindValue('toDate', $toDate, PDO::PARAM_STR);
$statement->execute();
$statement->setFetchMode(PDO::FETCH_ASSOC);
echo $statement->rowCount();
$collection = Collection::fromIterable($statement->fetchAll())->cache();
$collection->get(0);
$collection->get(0);

foreach ($collection as $i=> $r){
    var_dump($i, $r);
}
<?php
use \loophp\collection\Collection;
$fromDate = '2023-09-15 00:00:00';
$toDate = '2023-09-30 23:59:59';
$query = "SELECT *
          FROM communication
          WHERE `date` BETWEEN :fromDate AND :toDate";
$statement = $pdo->prepare($query);
$statement->bindValue('fromDate', $fromDate, PDO::PARAM_STR);
$statement->bindValue('toDate', $toDate, PDO::PARAM_STR);
$statement->execute();
$statement->setFetchMode(PDO::FETCH_ASSOC);
echo $statement->rowCount();
$collection = Collection::fromIterable($statement)->cache()->squash();
$collection->get(0);
$collection->get(0);

foreach ($collection as $i=> $r){
    var_dump($i, $r);
}

I also tried a similar thing with a generator function, and got the results I expected:

<?php
use \loophp\collection\Collection;
function getn(){
    for($i=1; $i < 5; $i++) yield $i;
}
$collection = Collection::fromIterable(getn())->cache();
$collection->get(0);
$collection->get(0);

foreach ($collection as $r){
    var_dump($r);
}
int 1
int 2
int 3
int 4

So, in every case except the very first one, get leaves the item in the collection. For some reason though, when I try to convert an executed PDO statement into a Collection, it doesn't. Even though I used cache to make it rewindable. Still looking to see if there are any other iterable types that have the same weird behavior as PDO.

dmjohnsson23 commented 1 year ago

I spent some time digging into the source myself so see if I could isolate the problem, and arrived at this minimal example that reproduces the issues:

<?php
require_once 'inc/prelude.php';
use loophp\iterators\ClosureIteratorAggregate;
use loophp\collection\Iterator\PsrCacheIterator;
use Symfony\Component\Cache\Adapter\ArrayAdapter;
use loophp\iterators\FilterIterableAggregate;
$fromDate = '2023-09-15 00:00:00';
$toDate = '2023-09-30 23:59:59';
$query = "SELECT *
          FROM communication
          WHERE `date` BETWEEN :fromDate AND :toDate";
$statement = $pdo->prepare($query);
$statement->bindValue('fromDate', $fromDate, PDO::PARAM_STR);
$statement->bindValue('toDate', $toDate, PDO::PARAM_STR);
$statement->execute();
$statement->setFetchMode(PDO::FETCH_ASSOC);
echo $statement->rowCount();
$cached = new ClosureIteratorAggregate(fn () => yield from new PsrCacheIterator($statement->getIterator(), new ArrayAdapter()), []);
$get = fn () => yield from new FilterIterableAggregate($cached, fn()=>true);
$get()->current();
$get()->current();

foreach ($cached as $i=> $r){
    var_dump($i, $r);
}

However, "unwapping" the ClosureIteratorAggregate seems to solve the issue. This code produces the expected results:

<?php
require_once 'inc/prelude.php';
use loophp\iterators\ClosureIteratorAggregate;
use loophp\collection\Iterator\PsrCacheIterator;
use Symfony\Component\Cache\Adapter\ArrayAdapter;
use loophp\iterators\FilterIterableAggregate;
$fromDate = '2023-09-15 00:00:00';
$toDate = '2023-09-30 23:59:59';
$query = "SELECT *
          FROM communication
          WHERE `date` BETWEEN :fromDate AND :toDate";
$statement = $pdo->prepare($query);
$statement->bindValue('fromDate', $fromDate, PDO::PARAM_STR);
$statement->bindValue('toDate', $toDate, PDO::PARAM_STR);
$statement->execute();
$statement->setFetchMode(PDO::FETCH_ASSOC);
echo $statement->rowCount();
$cached = new PsrCacheIterator($statement->getIterator(), new ArrayAdapter());
$get = fn () => yield from new FilterIterableAggregate($cached, fn()=>true);
$get()->current();
$get()->current();

foreach ($cached as $i=> $r){
    var_dump($i, $r);
}

Therefore the issue may actually be in the dependency loophp\iterators.

github-actions[bot] commented 1 year ago

Since this issue has not had any activity within the last 5 days, I have marked it as stale. I will close it if no further activity occurs within the next 5 days.

drupol commented 1 year ago

Hello,

I apologize for the delayed response; my schedule has been quite packed lately. Thank you for your patience.

Were you able to find a solution to the issue in the meantime?

dmjohnsson23 commented 1 year ago

No worries, it's not a sin to have a life outside of Github. Take the time you need with what is important to you.

I was able to work around the issue by using fetchAll to convert the iterable to an array before turning it into a Collection. Not being able to use cache in unfortunate, but it isn't a show-stopper.

However, I never did manage to identify the root cause of the issue or create an actual fix. I suspect that somewhere along the chain of iteratables there is one that isn't fully abstracting over the previous one in the chain, but I spent several hours looking and couldn't find it.

drupol commented 1 year ago

I have to admit that I wish I could reproduce this easily at home, but I don't have any project using Doctrine right now, so I don't know how to do this.

Do you have a way to reproduce this easily with a SQLite based project, for example? So we could share a project and reproduce the issue?

dmjohnsson23 commented 1 year ago

I'm actually not using Doctrine either, I'm using PDO.

Sure, here is a complete, self-contained script that reproduces the issue using SQLite:

<?php require_once 'vendor/autoload.php';
use loophp\iterators\ClosureIteratorAggregate;
use loophp\collection\Iterator\PsrCacheIterator;
use Symfony\Component\Cache\Adapter\ArrayAdapter;
use loophp\iterators\FilterIterableAggregate;

$pdo = new PDO('sqlite::memory:');
$pdo->query("CREATE TABLE communication(
    id INTEGER PRIMARY KEY,
    date TEXT,
    remarks TEXT
);");
$pdo->query("INSERT INTO communication
    (id, date, remarks)
    VALUES
    (1, '2023-09-17 11:47:11', 'This is the first one'),
    (2, '2023-09-18 07:52:19', 'This is the second one'),
    (3, '2023-09-19 13:05:17', 'This is the third one');
");

$fromDate = '2023-09-15 00:00:00';
$toDate = '2023-09-30 23:59:59';
$query = "SELECT *
          FROM communication
          WHERE `date` BETWEEN :fromDate AND :toDate";
$statement = $pdo->prepare($query);
$statement->bindValue('fromDate', $fromDate, PDO::PARAM_STR);
$statement->bindValue('toDate', $toDate, PDO::PARAM_STR);
$statement->execute();
$statement->setFetchMode(PDO::FETCH_ASSOC);

$cached = new ClosureIteratorAggregate(fn () => yield from new PsrCacheIterator($statement->getIterator(), new ArrayAdapter()), []);
$get = fn () => yield from new FilterIterableAggregate($cached, fn()=>true);
$get()->current();
$get()->current();

foreach ($cached as $i=> $r){
    var_dump($i, $r);
}

Actual output:

int(0)
array(3) {
  ["id"]=>
  int(3)
  ["date"]=>
  string(19) "2023-09-19 13:05:17"
  ["remarks"]=>
  string(21) "This is the third one"
}

Expected output:

int(0)
array(3) {
  ["id"]=>
  int(1)
  ["date"]=>
  string(19) "2023-09-17 11:47:11"
  ["remarks"]=>
  string(21) "This is the first one"
}
int(1)
array(3) {
  ["id"]=>
  int(2)
  ["date"]=>
  string(19) "2023-09-18 07:52:19"
  ["remarks"]=>
  string(22) "This is the second one"
}
int(2)
array(3) {
  ["id"]=>
  int(3)
  ["date"]=>
  string(19) "2023-09-19 13:05:17"
  ["remarks"]=>
  string(21) "This is the third one"
}
drupol commented 1 year ago

Thanks for the snippet.

I modified it a bit:

<?php declare(strict_types=1);

namespace Bug;

use loophp\collection\Collection;
use PDO;

require_once __DIR__ . '/vendor/autoload.php';

$pdo = new PDO('sqlite::memory:');
$pdo->query("CREATE TABLE communication(
    id INTEGER PRIMARY KEY,
    date TEXT,
    remarks TEXT
);");
$pdo->query("INSERT INTO communication
    (id, date, remarks)
    VALUES
    (1, '2023-09-17 11:47:11', 'This is the first one'),
    (2, '2023-09-18 07:52:19', 'This is the second one'),
    (3, '2023-09-19 13:05:17', 'This is the third one');
");

$fromDate = '2023-09-15 00:00:00';
$toDate = '2023-09-30 23:59:59';
$query = "SELECT *
          FROM communication
          WHERE `date` BETWEEN :fromDate AND :toDate";
$statement = $pdo->prepare($query);
$statement->bindValue('fromDate', $fromDate, PDO::PARAM_STR);
$statement->bindValue('toDate', $toDate, PDO::PARAM_STR);
$statement->execute();
$statement->setFetchMode(PDO::FETCH_ASSOC);

$collection = Collection::fromIterable($statement->getIterator());

$collection->current();
$collection->current();

foreach ($collection as $i => $r){
    var_dump($i, $r);
}

And the bug seems to be gone.

Is it working on your side?

dmjohnsson23 commented 1 year ago

Yes, that code seem to give the expected result. I played around with it a little, and the change that seems to make the difference is using $statement->getIterator() rather than just $statement. It's interesting to me, since PDOStatement implements IteratorAggregate, so I would generally consider it to be iterable all by itself. Now I'm really curious why that makes a difference.

drupol commented 1 year ago

I have the feeling that it is something internal... maybe it would be nice to reproduce the issue without using any libraries and submit it upstream.

dmjohnsson23 commented 1 year ago

I'm beginning to wonder the same thing; the bug may be with PDO or PHP itself.

drupol commented 1 year ago

I'll do further tests tomorrow. In any case, nice catch :)

github-actions[bot] commented 1 year ago

Since this issue has not had any activity within the last 5 days, I have marked it as stale. I will close it if no further activity occurs within the next 5 days.

drupol commented 1 year ago

@dmjohnsson23 The issue now closed, let me know if you need more assistance or eventually if you find more information about this issue. Feel free to reopen it in any case.