opendatacube / odc-stac

Load STAC items into xarray Datasets.
Apache License 2.0
140 stars 20 forks source link

Missing CRS on data file stops data from being loaded #121

Closed alexgleith closed 1 year ago

alexgleith commented 1 year ago

I have an example I'm exploring currently and have found an issue, probably with my STAC doc, which I'm creating with rio_stac.

This notebook reproduces the error: https://gist.github.com/alexgleith/9a16a56410cdb1fd2feff80ad8dabd0f

In short, the Item has one Asset with multiple Bands, and I'm manually setting the properties:epsg field, but odc.stac.load complains about the src.crs or dst.crs.

Stack trace: ``` python --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[4], line 1 ----> 1 odc.stac.load([item]) File [/opt/homebrew/lib/python3.11/site-packages/odc/stac/_load.py:610](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/odc/stac/_load.py:610), in load(items, bands, groupby, resampling, dtype, chunks, pool, crs, resolution, anchor, geobox, bbox, lon, lat, x, y, like, geopolygon, progress, fail_on_error, stac_cfg, patch_url, preserve_original_order, **kw) 607 if progress is not None: 608 _work = progress(SizedIterable(_work, total_tasks)) --> 610 for _ in _work: 611 pass 613 return _with_debug_info(ds, tasks=_tasks) File [/opt/homebrew/lib/python3.11/site-packages/odc/stac/_utils.py:38](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/odc/stac/_utils.py:38), in pmap(func, inputs, pool) 34 """ 35 Wrapper for ThreadPoolExecutor.map 36 """ 37 if pool is None: ---> 38 yield from map(func, inputs) 39 return 41 if isinstance(pool, int): File [/opt/homebrew/lib/python3.11/site-packages/odc/stac/_load.py:601](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/odc/stac/_load.py:601), in load.._do_one(task) 595 srcs = [ 596 src 597 for src in (_parsed[idx].get(band, None) for idx, band in task.srcs) 598 if src is not None 599 ] 600 with rio_env(**_rio_env): --> 601 _ = _fill_2d_slice(srcs, task.dst_gbox, task.cfg, dst_slice) 602 t, y, x = task.idx_tyx 603 return (task.band, t, y, x) File [/opt/homebrew/lib/python3.11/site-packages/odc/stac/_load.py:698](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/odc/stac/_load.py:698), in _fill_2d_slice(srcs, dst_gbox, cfg, dst) 695 return dst 697 src, *rest = srcs --> 698 _roi, pix = rio_read(src, cfg, dst_gbox, dst=dst) 700 for src in rest: 701 # first valid pixel takes precedence over others 702 _roi, pix = rio_read(src, cfg, dst_gbox) File [/opt/homebrew/lib/python3.11/site-packages/odc/stac/_reader.py:186](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/odc/stac/_reader.py:186), in rio_read(src, cfg, dst_geobox, dst) 163 """ 164 Internal read method. 165 (...) 182 183 """ 185 try: --> 186 return _rio_read(src, cfg, dst_geobox, dst) 187 except rasterio.errors.RasterioIOError as e: 188 if cfg.fail_on_error: File [/opt/homebrew/lib/python3.11/site-packages/odc/stac/_reader.py:226](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/odc/stac/_reader.py:226), in _rio_read(src, cfg, dst_geobox, dst) 223 if src.band > rdr.count: 224 raise ValueError(f"No band {src.band} in '{src.uri}'") --> 226 rr = _reproject_info_from_rio(rdr, dst_geobox, ttol=ttol) 228 if cfg.use_overviews and rr.read_shrink > 1: 229 ovr_idx = _pick_overview(rr.read_shrink, rdr.overviews(src.band)) File [/opt/homebrew/lib/python3.11/site-packages/odc/stac/_reader.py:100](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/odc/stac/_reader.py:100), in _reproject_info_from_rio(rdr, dst_geobox, ttol) 97 def _reproject_info_from_rio( 98 rdr: rasterio.DatasetReader, dst_geobox: GeoBox, ttol: float 99 ) -> ReprojectInfo: --> 100 return compute_reproject_roi(rio_geobox(rdr), dst_geobox, ttol=ttol) File [/opt/homebrew/lib/python3.11/site-packages/odc/geo/overlap.py:492](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/odc/geo/overlap.py:492), in compute_reproject_roi(src, dst, ttol, stol, padding, align) 488 # pylint: disable=too-many-locals 490 pts_per_side = 5 --> 492 tr = native_pix_transform(src, dst) 494 if tr.linear is None: 495 padding = 1 if padding is None else padding File [/opt/homebrew/lib/python3.11/site-packages/odc/geo/overlap.py:340](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/odc/geo/overlap.py:340), in native_pix_transform(src, dst) 337 if isinstance(src, GeoBox) and isinstance(dst, GeoBox) and src.crs == dst.crs: 338 return _same_crs_pix_transform(src, dst) --> 340 return GbxPointTransform(src, dst) File [/opt/homebrew/lib/python3.11/site-packages/odc/geo/overlap.py:105](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/odc/geo/overlap.py:105), in GbxPointTransform.__init__(self, src, dst, back) 99 def __init__( 100 self, 101 src: GeoBoxBase, 102 dst: GeoBoxBase, 103 back: Optional["GbxPointTransform"] = None, 104 ): --> 105 assert src.crs is not None and dst.crs is not None 106 self._src = src 107 self._dst = dst ```
alexgleith commented 1 year ago

Hmm... might be because the CRS isn't set on the .tif file actually.

import rasterio

file_url = "https://deppcpublicstorage.blob.core.windows.net/output/landsat-mosaic/2022/landsat-mosaic_2022_100_47.tif"

with rasterio.open(file_url) as src:
    print(src.crs)

None

Kirill888 commented 1 year ago

Yep, this error should be detected way earlier obviously, just before calling out to _reproject_info_from_rio

https://github.com/opendatacube/odc-stac/blob/52a016be2115f180e7059f67bcc6106fbeba7e8d/odc/stac/_reader.py#L219-L226

Or maybe here

https://github.com/opendatacube/odc-stac/blob/52a016be2115f180e7059f67bcc6106fbeba7e8d/odc/stac/_reader.py#L97-L100

if no CRS and no GCPs, it should raise a more meaningful error quoting source url.

probably worth having "fail if no CRS option" in, rio_geobox method too.

alexgleith commented 1 year ago

If the metadata (STAC) has a CRS, can't we just assume that it's the CRS?

Or is there something in Rio that needs it to be set on the actual data?

(I know it's simply bad practice to have a data file with no CRS, but other software will still load it.)

image
Kirill888 commented 1 year ago

It's possible to implement fallback for missing CRS data, but then you also need fallback for missing transform, for the consistency of the interface. And if you offer fallback mechanism, you might argue that override is also required. And since we support loading data from STAC items that do not include projection extension, you need to support overrides and fallbacks specified manually by the user. Eventually you end up with rather complicated logic as to when, if at all, data will be loadable and with what projection. That logic better be documented in a clear enough way, which will be hard to do with a small amount of text, and will probably need a diagram. All this complication pushed on to the user of the library, while enabling data providers to be sloppy with their output.

If someone can come up with an "advanced option" mechanism using user supplied logic to handle "broken data" like that in bespoke, data source specific way, then we should consider that. Attempting to make broken data "user friendly" with some internal complicated logic, without complicating things for the user on a "happy path", will be hard to pull off.

Action for this issue right now is:

alexgleith commented 1 year ago

Thanks, @Kirill888, I agree with you. I'll close this issue.