Closed rmoff closed 1 year ago
Looks like this was done here: https://github.com/treeverse/lakeFS/pull/4903/. There is a trade-off between performance and data-freshness here and we decided to side with performance. However I agree that having no way to refresh the data is a problem.
What happened?
Current Behavior:
When you execute a query in the DuckDB pane of the object page and then change the underlying object, if you re-execute the query the results don't change.
Steps to Reproduce:
Spin up the Docker Compose from https://github.com/treeverse/lakeFS/tree/docs/devex-173-quickstart/quickstart
From http://127.0.0.1:8000/repositories/quickstart/object?ref=main&path=lakes.parquet run the default DuckDB query. Note the results
Get a duckDB CLI prompt
docker exec -it duckdb duckdb
Load the parquet file as a table, delete some rows, and write it back to lakeFS
Read the parquet file back directly to verify the change to the data:
In the same browser window as before, click
Execute
. Note that the data does not change. Even if you change the value on theLIMIT
clause (e.g. from 20 to 5) the new data is not shown.Refresh the web page using the browser's controls and note that the correct data is now shown.
https://user-images.githubusercontent.com/3671582/225373703-2a7a2b99-f2ac-483e-953e-9bdf7ff6c6fb.mp4
Expected Behavior
When you run a query with DuckDB it should show the current data in the file.
If it is not going to do this then the UI should indicate very clearly that the data could be stale and have a button to force a refresh of it without requiring the user to reload the page (and thus lose their SQL query)
lakeFS Version
0.96.1
Deplyoment
Docker
Affected Clients
No response
Relevant logs output
No response
Contact Details
No response