Open tonyf opened 3 months ago
This could be due to hard-coded concurrency in object_store
: https://github.com/apache/arrow-rs/blob/a937869f892dc12c4730189e216bf3bd48c2561d/object_store/src/aws/mod.rs#L252
We might need to make this controllable upstream somehow.
Hm, is there any way to temporarily monkeypatch rust-level code in python?
Actually, we don't use delete_stream
(mainly by chance) so we probably don't need to worry about object_store
. I suspect this is fixed in 0.17.0b9 (released yesterday) via https://github.com/lancedb/lance/pull/2773
We were previously using num_cpus::get
and now are using LANCE_IO_THREADS
.
Actually, we don't use delete_stream (mainly by chance) so we probably don't need to worry about object_store.
What makes you say that? I see us call remove_stream
here:
Which dispatches to delete_stream
here:
I'm now getting
OSError: LanceError(IO): Generic S3 error: Got invalid DeleteObjects response: unknown variant `Code`, expected `Deleted` or `Error`, /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/fns.rs:368:13
Maybe this is happening because a previous cleanup operation failed without marking the version as deleted so it's getting a not found? Not sure how to work around this.
What makes you say that? I see us call remove_stream here:
Ah, I was just searching for delete_stream
and saw the parallelism on old_manifests
and assumed that was it. My mistake.
OSError: LanceError(IO): Generic S3 error: Got invalid DeleteObjects response: unknown variant
Code
, expectedDeleted
orError
, /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/fns.rs:368:13
That's a new one for me. Seems almost like a malformed S3 response.
Running into s3 rate limits when trying to cleanup a very large dataset with
dataset.cleanup_old_versions
. Can't seem to control this viaLANCE_IO_THREADS