pachyderm / examples

A curated list of examples that use Pachyderm to accomplish various tasks.
62 stars 16 forks source link

Big query connector #48

Closed JimmyWhitaker closed 1 year ago

JimmyWhitaker commented 1 year ago

This connector allows you to query a bigquery db and store the result as a parquet file. It is very simple and can easily be extended to support additional features.

Limitations:

JimmyWhitaker commented 1 year ago

Nice!

One high-level question: I don't know how pandas works—does this stage the results in memory? Does it (or could it be made to later) split the results into multiple files or just do one huge one?

I looked into this and it seems that there isn't a simple way to do a buffer stream through memory to the destination parquet file. We should scale test this to see what the limitations are.

It looks like we might need to write something similar to this: https://stackoverflow.com/questions/54673272/pandas-gets-stuck-when-trying-to-read-from-bigquery