halaxa / json-machine

Efficient, easy-to-use, and fast PHP JSON stream parser
Apache License 2.0
1.1k stars 65 forks source link

Re-iterate #80

Closed LagShaggy closed 2 years ago

LagShaggy commented 2 years ago

Hello there,

  1. I have a problem where I need to iterate through the JSON multiple times. If I start the second iteration attempt I get the following error:

Fatal error: Uncaught JsonMachine\Exception\SyntaxErrorException: Cannot iterate empty JSON '' At position 0. in /var/www/vendor/halaxa/json-machine/src/Parser.php:368

  | Stack trace:   | #0 /var/www/vendor/halaxa/json-machine/src/Parser.php(245): JsonMachine\Parser->error('Cannot iterate ...', NULL)   | #1 /var/www/src/Workorder.php(101): JsonMachine\Parser->getIterator()   | #2 /var/www/src/Workorder.php(87): App\Workorder->count(Object(JsonMachine\Items))   | #3 /var/www/src/Workflow.php(20): App\Workorder->push()   | #4 /var/www/public/index.php(11): App\Workflow->execute()   | #5 {main}   | thrown in /var/www/vendor/halaxa/json-machine/src/Parser.php on line 368

  1. Is there a way to remove items?
halaxa commented 2 years ago

That's correct. The stream has been read to the end and there's nothing left to read. Rewind the stream yourself and create a new instance of Items.

LagShaggy commented 2 years ago

My Problem is that I encapsulate the Creation of Items, such that I don't have access to the JSON anymore, how Do I rewind the Items without re-creating an Instance.

My code: private function returnJsonItem(\Psr\Http\Message\ResponseInterface $response): \JsonMachine\Items { $phpStream = \GuzzleHttp\Psr7\StreamWrapper::getResource($response->getBody()); return \JsonMachine\Items::fromStream($phpStream); }

halaxa commented 2 years ago

What about $response->getBody()->seek(0) from outside the function? Not nice, but you can reorganize your code to make it clearer ;)

halaxa commented 2 years ago

However, JSON Machine is built with one pass in mind. This behavior is undefined. You are better off just creating new instance of Items for each repeated iteration.

meduzen commented 2 years ago

What I do in such a situation is to not walk the JSON as part of a bigger process, but start from the JSON and dispatch async jobs that will handle the creation of items. And I do this with for example chunks of 50 items.

That looks like this (in a Laravel project, inside a Job class):

    /**
     * Stream the data from the JSON file, and chunk them.
     */
    public function handle()
    {
        $stream = JsonMachine::fromFile($this->filePath, '', new ExtJsonDecoder());
        $chunks = collect();

        // Iterate over the JSON stream.
        foreach($stream as $item) {
            $chunks->push($item);

            // Dispatch chunks by $this->chunkSize and reset the Collection.
            if ($chunks->count() == $this->chunkSize) {
                $this->addToBatch($chunks);
                $chunks = collect();
            }
        }

        // Dispatch remaining chunks.
        if ($chunks->isNotEmpty()) {
            $this->addToBatch($chunks);
        }
    }

    /**
     * Push chunks of entries to the current batch of jobs.
     */
    protected function addToBatch(Collection $chunks): void
    {
        $this->batch()->add([
            new HandleChunks($chunks),
        ]);
    }