Closed dmcguire81 closed 4 years ago
What is the pyarrow version you use that exhibits this hanging? Does using a different pyarrow version solves this issue?
I didn't have to specify the pyarrow
version - I'm assuming it's whatever is installed as a dependency of petatstorm==0.9.5
, but I'll check.
Looks like pyarrow==1.0.1
.
I was able to get a repro test case that was 100% pyarrow
, so I'll close this. However, I would expect the impact to be fairly pervasive if it includes all storage protocols, so it would be a good idea to separately track a work-around (perhaps downgrading the version of pyarrow
), if anyone else sees similar problems.
Thank you for the investigation. It's a nasty one. I would guess 0.15.1 is also impacted given our CI hangs from time to time with similar symptoms and it uses pyarrow 0.15.1.
Summary
Interaction between Petastorm and S3FS seems to be unusable, and it's unclear what level of testing and exercise this has gotten within the Petastorm project, itself, and the wider community, because basic operations (
make_reader
,make_batch_reader
) simply don't work at all. Breakdowns in the interaction could fall anywhere between this project,pyarrow
(forS3FSWrapper
) ands3fs
, but we're consuming Petastorm, directly, so starting here.Tested Versions
This had to be tested on an earlier version of
s3fs
(0.4.2
), because the more current versions (>=0.5.0
) have a different problem with theaiobotocore
wrapper leaking async coroutines into Petastorm. There will be a separate defect for that, and we'll take both to that project.Repro Test Case
Setup
Test