Open AdityaCohere opened 1 week ago
The retry decision happens here (or one of the other methods in the file): https://github.com/google/tensorstore/blob/2a44dc7d9b08f6f46e597c01457efdf1e18e54db/tensorstore/kvstore/s3/s3_metadata.cc#L218
You could turn on logging. TENSORSTORE_VERBOSE_LOGGING=s3=1
.
Unfortunately #149 doesn't integrate with any kvstore at present.
The retry decision happens here (or one of the other methods in the file):
You could turn on logging.
TENSORSTORE_VERBOSE_LOGGING=s3=1
.
Yea, I've tried turning on Logging as well with no output specifying any retries. I see that the 429 response is received and nothing beyond that other than a failure. I will repro this and get the debug logs
Unfortunately #149 doesn't integrate with any kvstore at present.
Ah, this is unfortunate to hear, would it non-trivial to integrate?
Unfortunately #149 doesn't integrate with any kvstore at present.
Ah, this is unfortunate to hear, would it non-trivial to integrate?
It'd take some work, I guess the amount of code required would be smaller than the current s3 kvstore as it could defer to the C++ S3 SDK.
However, the C++ SDK is a fairly heavy dependency: the S3 C CRT looks much lighter. That's next, when I find the time.
You could turn on logging.
TENSORSTORE_VERBOSE_LOGGING=s3=1
.Yea, I've tried turning on Logging as well with no output specifying any retries. I see that the 429 response is received and nothing beyond that other than a failure. I will repro this and get the debug logs
With s3=1
you should see a line from s3_key_value_store.cc:435 ReadTask
or s3_key_value_store.cc:717 WriteTask
or such. Could you attach those lines?
Also, you could attempt to run localstack_test against OCI object storage.
git clone https://github.com/google/tensorstore.git
cd tenstorstore
bazelisk.py build tensorstore/kvstore/s3:localstack_test
./bazel-bin/tensorstore/kvstore/s3/localstack_test --tensorstore_verbose_logging=all=2 --localstack_endpoint=... <other flags>
Hey, I've tried turning on logging and I don't seem to see any logs regarding s3 being printed.
I get a rate limited error but no read/write tasks.
I've tried TENSORSTORE_VERBOSE_LOGGING=s3=1
as well as TENSORSTORE_VERBOSE_LOGGING=all
Is there anything else I can try?
I'm using Tensorstore's key value store driver with OCI object storage using their S3 compatibility layer and am seeing that when the OCI Object Storage service sends 429s Tensorstore fails immediately without any retries. This was tested with debug logging enabled which also didn't show any retries happening either with the default value of 32 or with
s3_request_retries
set to a custom number. I also tried testing this @sjperkins 's PR here to see if that may enable retries to work here however I'm hitting bazel build issues on a M1 Mac of:I was wondering if there was anything I could do to debug the lack of retries further or any modifications I could make to force retries.