splitgraph / seafowl

Analytical database for data-driven Web applications 🪶
https://seafowl.io
Apache License 2.0
390 stars 9 forks source link

Upload endpoint refinements #439

Closed gruuya closed 1 year ago

gruuya commented 1 year ago

Instead of buffering the entire file in memory, stream the incoming bits to a local temp file, and then scan it when appending to the target table.

Memory implications (using a 160MB parquet file):

In addition, bump DataFussion to post-26 version to pick up schema coercion in Parquet reader, and thus close #179. Also add some tests demonstrating the new flexibility of the upload endpoint (implicit type casting, column skipping etc.).