Closed shteren1 closed 3 years ago
I don't know how those errors appear, but I suspect this isn't going to work. s3fs is built on fsspec, which says that files are only seekable in read mode, but HDF5 needs to seek in all modes.
The no attribute 'seek'
error is similar to #1434 and #1530. I thought we had fixed that. Are you sure you're using h5py 3.1? Check with h5py.version.info
.
HDF5's "file drivers" API is not really well suited to data in object stores like S3. HDF5 assumes that it's fairly cheap to jump around in a file and read/write small amounts of data. Object stores are based on storing/retrieving a whole 'blob' at once. HDF group have been trying to address this, first with the HSDS system, and then with the virtual object layer, which is new in HDF5 1.12.
Closing as I don't think there's any way to fix this. HDF5 expects that files are seekable, and s3fs files are only seekable in read mode.
Hi, while reading existing files from s3 storage works like a charm (replace 'ab' with 'rb' and 'a' with 'r' in the below example) with s3fs, trying to write files or append to existing files fails.
Tested package builds from conda (latest package available currently): s3fs 0.5.2 pyhd8ed1ab_0 conda-forge h5py 3.1.0 nompi_py37h1e651dc_100 defaults
s3fs supports writing and appending to files in s3 for json/csv/other text files.
Consider the following example:
This fails on this error:
ValueError: Invalid value of 'fileobj' argument; must equal to file-like object if specified.
On line 407 in h5py/_hl/files.py, which doesn't make much sense, its basically testing the s3file object against itself and returning False, when i open the same file with 'rb' instead of 'ab' using s3fs and 'r' using h5py.File this test returns True.I tried to meddle a bit with the h5py code and commented this check, but the code still fails later due to this error:
AttributeError: 'S3File' object has no attribute 'seek'
Thanks, Yotam Stern.