treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data
https://docs.lakefs.io
Apache License 2.0
4.48k stars 359 forks source link

Update Quickstart to use DuckDB WASM so we can remove the DuckDB Docker image #6076

Closed rmoff closed 1 year ago

rmoff commented 1 year ago

Now that https://github.com/treeverse/lakeFS/pull/6044 is merged, we should be able to update the Quickstart to use DuckDB in the browser alone, and remove the DuckDB Docker image

rmoff commented 1 year ago

@ozkatz

  1. All command statements SET etc just show No Rows Returned when run - there should be something to confirm successful execution (presumably the absence of an error indicate that? but user shouldn't have to guess)

    CleanShot_2023-06-14_at_16 30 54

  2. ALLOW_OVERWRITE doesn't work, and needs to for this example.

    CleanShot_2023-06-14_at_16 27 12

  3. Side note: Writing data back doesn't seem to work, but doesn't throw an error to the user either

    1. If I remove the ALLOW_OVERWRITE I don't get an error, but the file isn't updated.
    2. If I write it to a new file the file doesn't get written. There's a warning logged in the console CleanShot_2023-06-14_at_16 38 23
  4. Should we add a "Launch DuckDB" button somewhere to the UI? If we're saying now that users can manipulate their data throughout the lake using DuckDB, making them launch it by finding a compatible file in the Objects pane is a bit of a roundabout way

ozkatz commented 1 year ago
  1. Why would you need to run those in the browser? It's already preconfigured
  2. I was able to write by simply doing COPY lakes TO 'lakefs://repo/branch/path.parquet' - without any format or or overwrite directives.
  3. Unfortunately, duckdb-wasm and regular duckdb handle IO pretty differently (and inconsistently). Another example are the S3 parameters you set in your screenshot - the s3_endpoint (which again, is preconfigured for you, so no need to set) needs to be a fully qualified URL, including the leading https:// part. s3_url_style doesn't even exist in duckdb-wasm, but is implied by the existence of 'http' in the endpoint.
ozkatz commented 1 year ago

As for a "launch duckdb" button - yes, that makes sense, could you open an issue for it? (honestly not sure we'll be able to prioritize it any time soon but let's try)