opendatacube / eo-datasets

Easily write, validate and convert EO datasets and metadata.
Apache License 2.0
48 stars 26 forks source link

Support mixing remote accessory references #234

Open jeremyh opened 2 years ago

jeremyh commented 2 years ago

Reported by Belle and Toktam

warnings.warn(
Traceback (most recent call last):
  File "le_lccs_odc.py", line 74, in <module>
    gridded_classification.run_classification(
  File "/home/jovyan/livingearth_lccs/le_lccs/le_utils/gridded_classification.py", line 366, in run_classification
    export_obj.write_xarray(l4_out_classification_array, **product.config())
  File "/home/jovyan/livingearth_lccs/le_lccs/le_export/gridded_export.py", line 187, in write_xarray
    p.add_accessory_file("lineage:static", data_xarray.attrs.get("accessories"))
  File "/home/jovyan/eo-datasets/eodatasets3/assemble.py", line 1025, in add_accessory_file
    self.note_accessory_file(*args, **kwargs)
  File "/home/jovyan/eo-datasets/eodatasets3/assemble.py", line 1584, in note_accessory_file
    self._checksum.add_file(Path(path))
  File "/home/jovyan/eo-datasets/eodatasets3/verify.py", line 97, in add_file
    hash_ = self._checksum(file_path)
  File "/home/jovyan/eo-datasets/eodatasets3/verify.py", line 115, in _checksum
    hash_ = calculate_file_hash(file_path)
  File "/home/jovyan/eo-datasets/eodatasets3/verify.py", line 42, in calculate_file_hash
    with Path(filename).open("rb") as f:
  File "/usr/lib/python3.8/pathlib.py", line 1222, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "/usr/lib/python3.8/pathlib.py", line 1078, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 's3:/dea-public-data/projects/LCCS/urban_mask.tif'

This is because DatasetAssembler is writing a package locally, and assumes all file references are local.

These derivative processing systems write local files but are trying to reference remote accessory files.

I'm not certain if this is what we want to do (accessories were originally references to non-measurement extra files included in the package), but we should decide on the preferred way for people to do this.

tebadi commented 2 years ago

I think we need to add support for accessory files stored on s3 but I'm keen to know what @omad and @SpacemanPaul think about this.

SpacemanPaul commented 2 years ago

Toktam's point makes sense to me. At a minimum, supporting accessory files stored remotely seems a reasonable use case given the community's increasing reliance on cloud storage.

tebadi commented 2 years ago

@jeremyh Happy for me to add support for this?

jeremyh commented 2 years ago

Yep!