Closed JimmyWhitaker closed 1 year ago
Nice!
One high-level question: I don't know how pandas works—does this stage the results in memory? Does it (or could it be made to later) split the results into multiple files or just do one huge one?
I looked into this and it seems that there isn't a simple way to do a buffer stream through memory to the destination parquet file. We should scale test this to see what the limitations are.
It looks like we might need to write something similar to this: https://stackoverflow.com/questions/54673272/pandas-gets-stuck-when-trying-to-read-from-bigquery
This connector allows you to query a bigquery db and store the result as a parquet file. It is very simple and can easily be extended to support additional features.
Limitations: