reactphp / stream

Event-driven readable and writable streams for non-blocking I/O in ReactPHP.
https://reactphp.org/stream/
MIT License
626 stars 62 forks source link

Stream data into multiple CSV files and stream a ZIP file with these #158

Closed stereomon closed 3 years ago

stereomon commented 3 years ago

I would like to achieve a data export from a fast-growing database table (assume more than 1m entries). Additionally, I would like to have different CSV files with different data. Imagine orders.csv and order-items.csv. I don't want to pull all the data into memory so I was thinking about streaming the data into CSV files and stream those into a ZIP archive that should be streamed as well. I can't find any documentation about how I can achieve that and I wasn't able to find out by try and error. I completely miss the possibility to stream the CSV file streams into a ZIP file stream.

Is there anything you can point me to to achieve my goal or is it simply not possible? I guess I'm too focused on something that can't work to be able to come up with other ideas...

$loop = \React\EventLoop\Factory::create();

// streamX should be replaced with somthing I can stream into a ZIP stream.
$outputStream1 = new \React\Stream\WritableResourceStream(stream1, $loop);
$csvStream1 = new \Clue\React\Csv\Encoder($outputStream1);

$outputStream2 = new \React\Stream\WritableResourceStream(stream2, $loop);
$csvStream2 = new \Clue\React\Csv\Encoder($outputStream2);

// This is what I'm looking for. I also looked into maennchen/zipstream-php but it requires to have a tmpfile
$zip->addFromStream('file1.csv', $outputStream1);
$zip->addFromStream('file2.csv', $outputStream2);

$loop->run();

foreach ($rows as $row) {
    $csvStream1->write($row);
    $csvStream2->write($row);
}
stereomon commented 3 years ago

@clue I guess you can help me here. Would be nice to get your input.

clue commented 3 years ago

@stereomon Thanks for bringing this up, this is an interesting one!

To recap, you're trying to stream a large number of records / row data from a database into a CSV file stream into a compressed archive?

Most of this is indeed already possible, but I'm not aware of any streaming ZIP implementation that builds on top of ReactPHP at the moment. This would be completely possible and I would love to see one! (If this is a commercial project and you want me to take a look, please shoot me a mail and we'll get this sorted!)

A ZIP archive is essentially a continuous stream of (compressed) files and some meta data regarding each file. Adding a new file to an archive isn't too hard, but you're going to have a hard time streaming multiple files into a single archive concurrently.

As a starting point, you may want to use ReactPHP's ChildProcess to temporarily compress independent archive files and then combine them into a single archive file.

As an alternative and depending on your use case, you may also keep this as separate archive files or use https://github.com/clue/reactphp-zlib to stream this into two independent orders.csv.gz and orders-items.csv.gz files (GZIP != ZIP). You may also combine multiple files into a TAR archive before compressing and creating a dump.tar.gz (or dump.tgz) file once https://github.com/clue/reactphp-tar/issues/2 is completed.

I hope this helps :+1: