Open tonyf opened 2 months ago
Are you able to reproduce this reliably? The error error decoding response body
is a bit of a generic error given by our HTTP client library.
The plan generated by that scan will be a "late materialized scan". First it will read the imagenet
column to get the valid row ids. Then it will generate a take
operation to grab the remaining columns from those row ids. If there is a scalar index then we skip reading the filter column and just scan the index to figure out which row ids to grab.
The only thing that jumps to mind is that maybe we are triggering more concurrent reads to S3 when a scalar index is used? I think the last time I encountered that error message it was related to reading very large values in a single request from S3.
Can you generate a trace and attach it? Just put this at the beginning on the script:
from lance.tracing import trace_to_chrome
trace_to_chrome(file="/some/file.json", level="debug")
Then it will generate /some/file.json
. You can attach that to this issue and we can take a look. Also, you can use https://ui.perfetto.dev/ if you want to look at the trace yourself.
I have a remote dataset stored on s3. Without scalar indices, using the scanner api with a filter works fine. However, once a scalar index is added, I get a
OSError: Io error: Execution error: Wrapped error: LanceError(IO): Generic S3 error: error decoding response body
Exception:
If I checkout the dataset to a version without the scalar index, no exception is raised