wireservice / csvkit

A suite of utilities for converting to and working with CSV, the king of tabular file formats.
https://csvkit.readthedocs.io
MIT License
5.9k stars 605 forks source link

sql2csv is buffering the output instead of outputting data as it comes #1203

Closed Vistritium closed 9 months ago

Vistritium commented 1 year ago

I'm trying to write a large dataset to a file with sql2csv from mariadb. But the process is getting killed because it runs out of memory. It looks like sql2csv is buffering all the data in the memory and only outputs it as a last step, which makes it unusable for large data sets and also make it use memory resources unnecessarily.

sql2csv should output the data as it comes, or even better, it should have an option to write data to file directly instead of relying on user redirecting output to file.

jpmckinney commented 11 months ago

I'm reading the code and I don't see where it could be loading the results of the SQL query into memory, unless SQLAlchemy or your MariaDB driver is doing it. The code is very simple:

https://github.com/wireservice/csvkit/blob/f1ee8ff4adba35331dce18a0ea6fd11c19ff81bb/csvkit/utilities/sql2csv.py#L72-L80

jpmckinney commented 11 months ago

What's the memory consumption when you just run the SQL query directly on MariaDB?

jpmckinney commented 9 months ago

Closing as no response to assist with reproducing the issue.