neo4j-php / neo4j-php-client

Php client and driver for neo4j database
https://neo4j.com/developer/php/
MIT License
160 stars 40 forks source link

Duplicate results returned when no. of results is more than fetch size #149

Closed smivz closed 1 year ago

smivz commented 2 years ago

Duplicate results returned when no. of results is more than fetch size. This problem seems to be related to SessionConfiguration::DEFAULT_FETCH_SIZE which by default is 1000. However, when you are retrieving more than 1000 results, it will just return the the first 1000 results multiple times e.g. in the example below there are 4000 results to be fetched but what we got was the first 1000 results four times

To Reproduce

   $client = ClientBuilder::create()
        ->withDriver('bolt', 'bolt://username:password@localhost')
        ->withDefaultDriver('bolt')
        ->build();

    // Add 4000 user nodes
    for ($i = 0; $i < 4000; $i++) {
        $client->run('CREATE (user:User)');
    }

    // Confirm that the database contains 4000 unique user nodes
    $userCountResults = $client->run('MATCH (user:User) RETURN COUNT(DISTINCT(ID(user))) as user_count');
    $userCount = $userCountResults[0]->get('user_count');
    if ($userCount === 4000) {
        echo "Confirmed that we now have 4000 user nodes" . PHP_EOL;
    } else {
        echo "Expecting 4000 user nodes, found " . $userCount;
        exit;
    }

    // Retreve the ids of all user nodes
    $results = $client->run('MATCH (user:User) RETURN ID(user) AS id');

    // Loop through the results and add each id to an array
    $userIds = [];
    foreach ($results as $result) {
        $userIds[] = $result->get('id');
    }

    // Confirm we have 4000 items in the array
    if (count($userIds) === 4000) {
        echo "Confirmed that we now have 4000 ids in the array" . PHP_EOL;
    } else {
        echo "Expecting 4000 ids in the array, found " . count($userIds);
        exit;
    }

    // Check if we have any duplicate ids by removing duplicate values
    // from the array.
    $uniqueUserIds = array_unique($userIds);

    // Both the $userIds & $uniqueUserIds arrays should contain 4000
    // ids but $uniqueUserIds only contains 1000.
    if (count($userIds) !== count($uniqueUserIds)) {
        echo 'The $userIds count is: ' . count($userIds) . PHP_EOL;
        echo 'The $uniqueUserIds count is: ' . count($uniqueUserIds) . PHP_EOL;
        echo 'This is an issue as both should be 4000 ' . PHP_EOL;
    }

Expected behavior $results = $client->run('MATCH (user:User) RETURN ID(user) AS id');

The above, should not return duplicates and should instead return all results fetched by the query.

Alternatively, only the number of results specified in the fetch limit should be returned. However, I would suggest that the default fetch limit should be unliimited and the developer can set a fetch limit if they wish

Desktop (please complete the following information):

joecoolio commented 2 years ago

I am seeing the same thing in version 2.7.0.

transistive commented 1 year ago

Fixed by #157, release 2.8.0 and https://github.com/neo4j-php/neo4j-php-client/commit/f8b683f19d5fb3f08024095f53d4ba6a029d7af7

Will be released later today in version 2.8.1