graphaware / neo4j-bolt-php

PHP Driver for Neo4j's Binary Protocol : Bolt
MIT License
42 stars 38 forks source link

Memory exhausts when used in a long-running process #33

Open monyxie opened 4 years ago

monyxie commented 4 years ago

I'm trying to export a table from MySQL to Neo4j using a PHP script. The table contains some 2 million rows. The script aborts everytime at around 500k rows with this error:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 62914608 bytes) in E:\code\neo4j-demo\vendor\graphaware\neo4j-bolt\src\PackStream\StreamChannel.php on line 74

Call Stack:
    0.4029     407160   1. {main}() E:\code\neo4j-demo\src\index.php:0
    1.9149    4026888   2. Illuminate\Database\Query\Builder->chunkById() E:\code\neo4j-demo\src\index.php:61
  535.8047   68194584   3. {closure:E:\code\neo4j-demo\src\index.php:46-61}() E:\code\neo4j-demo\vendor\illuminate\database\Concerns\BuildsQueries.php:103
  535.8149   68823696   4. GraphAware\Neo4j\Client\Transaction\Transaction->commit() E:\code\neo4j-demo\src\index.php:56
  535.8156   68860616   5. GraphAware\Bolt\Protocol\V1\Transaction->runMultiple() E:\code\neo4j-demo\vendor\graphaware\neo4j-php-client\src\Transaction\Transaction.php:205
  535.8229   69433616   6. GraphAware\Bolt\Protocol\Pipeline->run() E:\code\neo4j-demo\vendor\graphaware\neo4j-bolt\src\Protocol\V1\Transaction.php:161
  536.5241   69989080   7. GraphAware\Bolt\Protocol\V1\Session->run() E:\code\neo4j-demo\vendor\graphaware\neo4j-bolt\src\Protocol\Pipeline.php:62
  536.5253   69991792   8. GraphAware\Bolt\PackStream\Unpacker->unpack() E:\code\neo4j-demo\vendor\graphaware\neo4j-bolt\src\Protocol\V1\Session.php:106
  536.5254   69991824   9. GraphAware\Bolt\PackStream\StreamChannel->read() E:\code\neo4j-demo\vendor\graphaware\neo4j-bolt\src\PackStream\Unpacker.php:73

The script I'm using is as follows:

<?php

include "../vendor/autoload.php";

use GraphAware\Neo4j\Client\ClientBuilder;
use Illuminate\Container\Container;
use Illuminate\Database\Connectors\ConnectionFactory;
use Illuminate\Support\Collection;

$config = [
    'driver' => 'mysql',
    'host' => 'localhost',
    'port' => 3306,
    'database' => '...',
    'username' => '...',
    'password' => '...',
    'charset' => 'utf8mb4',
    'collation' => 'utf8mb4_general_ci',
    'prefix' => '',
    'strict' => true,
];

$container = new Container();
$factory = new ConnectionFactory($container);
$mysql = $factory->make($config);

$client = ClientBuilder::create()
    ->addConnection('bolt', 'bolt://neo4j:new4j@localhost:7687')
    ->build();

$total = $mysql->query()->from('follow')->count();

$startTime = time();

$done = 0;
$mysql->query()
    ->from('follow')
    ->chunkById(
        1000,
        function (Collection $rows) use ($total, &$done, $client) {
            echo "$done/$total {$rows->last()->id}\n";

            $tx = $client->transaction();
            foreach ($rows as $row) {
                $tx->push(
                    'MERGE (a:Member {uid:$uid}) MERGE (b:Member {uid:$followUid}) MERGE (a)-[:FOLLOWS]->(b)',
                    ['uid' => $row->uid, 'followUid' => $row->followUid]
                );
            }
            $tx->commit();

            $done += count($rows);
        }
    );

echo time() - $startTime;
betd-claumond commented 4 years ago

Hello @monyxie

Have you found a solution or a workaround for this issue please ?

On my side, I debugged a long-running process and found out the memory was used in the class variable $this->bytes of the class GraphAware\Bolt\PackStream\StreamChannel.

In a long running process, it add all the received data from the start of session in this variable, so this variable get bigger and bigger...

I have not found yet a clean way to restart the session or the client because I'm using Symfony and its dependency injection, but in your case, I guess that recreating a new client from time to time would solve your issue.

It would be nice if you could let us know how you solved this on your side.

Best Regards, Christophe

monyxie commented 4 years ago

@betd-claumond Yep, that's what I found too. I believe I changed the script so that it could pick up the work from last run and continue. But I don't have a solution for this. It's just a one time use script. Sorry.

betd-claumond commented 4 years ago

@monyxie thanks for the feedback !