Closed dmcguire81 closed 4 years ago
Got it. I assume our idea of parametrizing make_read/make_batch_reader with a file-system instance would solve this. RIght?
The merged code solved this, but parameterizing with a FileSystem
might allow me to work-around the deadlock in the interaction between pyarrow
and s3fs
. As mentioned, I'll pull you into that ongoing conversation.
The problem is that their stock configuration uses the legacy retry mode, which is not the recommendation from AWS. Since Petastorm instantiates the
s3fs
client when using eithermake_reader
ormake_batch_reader
, there's no opportunity to configure it correctly, and brute-force changes to the~/.aws/config
file impacts any code usingboto3
, even if unrelated, so there is no fine-grained control to the number of retries by use case.