google / array_record

Apache License 2.0
74 stars 15 forks source link

GCS support #120

Open kentslaney opened 1 month ago

kentslaney commented 1 month ago

While trying to load a TFDS dataset from GCS, I encountered this error:

/usr/local/lib/python3.10/dist-packages/array_record/python/array_record_data_source.py in _create_reader
    176 def _create_reader(filename: epath.PathLike):
    177   """Returns an ArrayRecordReader for the given filename."""
--> 178   return array_record_module.ArrayRecordReader(
    179       filename,
    180       options="readahead_buffer_size:0",

RuntimeError: open() failed: No such file or directory; opening gs://ref_coco/ref_coco/refcoco_unc/1.1.0/ref_coco-train.array_record-00000-of-00032

See the minimum reproducible example here:

https://colab.research.google.com/drive/1iezVDZBJrWtP3fVTpAMNGb6qfENZNKuD?usp=sharing

kentslaney commented 1 month ago

I should mention that I'm not blocked right now because of this, the dataset is small enough to fit on disk