intake / intake-stac

Intake interface to STAC data catalogs
https://intake-stac.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
110 stars 25 forks source link

Xarray attributes (CRS, transform) missing #91

Closed g2giovanni closed 2 years ago

g2giovanni commented 3 years ago

I have issues opening some catalog on https://earth-search.aws.element84.com/v0. When I try to make some operation with rioxarray (e.g., a clip operation) I get the error:

MissingCRS: CRS not found. Please set the CRS with 'rio.write_crs()'

That's because into the attributes of the xarray returned from the to_dask() method attributes transform, crs and res have not be set. I noticed this is happening only for catalogues of the beginning of 2018. For this reason, rioxarry can't georeference data in the proper way. Why those attributes have not been set?

A snippet of code to reproduce the error:

from shapely import wkt
import intake
import intake_stac
import satsearch

wkt_geometry = "Polygon ((9.59935220198788208 44.99328353692637705, 9.82295202809843637 44.99189113160298348, 9.82469113104509972 45.11301999240544092, 9.60061886478441018 45.11441825799879268, 9.59935220198788208 44.99328353692637705))"

start_date = "2018-01-01"
end_date = "2018-01-03"

results = satsearch.Search.search (
            url="https://earth-search.aws.element84.com/v0",
            intersects=mapping(wkt.loads(wkt_geometry)),
            datetime="%s/%s" % (start_date, end_date),
            query={"eo:cloud_cover": {"lte": 100}},
            collections=["sentinel-s2-l2a"],
        )

items = results.items()

for item in items:
        print("Processing item {} for asset {}".format(item, asset_key))
        single_item = intake.open_stac_item(item) # stac_item_collection[str(item_by_date)]

        asset_xarray = single_item['SCL'](chunks=dict(band=1, y=2048, x=2048)).to_dask()
        asset_xarray.rio.set_nodata(0)
        asset_clipped = asset_xarray.rio.clip([mapping(_wkt_bounds)], crs=4326, all_touched=True, drop=True)
        xarray_all_patches.append(asset_clipped)

    mosaic = rioxarray.merge.merge_arrays(xarray_all_patches, precision=50, nodata=nodataval)

The clip operation with rioxarray is the one that fails (asset_clipped = asset_xarray.rio.clip([mapping(_wkt_bounds)], crs=4326, all_touched=True, drop=True))

scottyhq commented 3 years ago

@g2giovanni Currently intake-stac returns a standard xarray DataArray. Note that rioxarray is an extension of xarray, and the CRS is only added to the xarray object if you open a file via rioxarray or if you explicitly use the xarray_dataarray.rio.write_crs() method as the Error message indicates.

So before running methods like rio.clip() you can do something like asset_xarray.rio.write_crs(item.properties['proj:epsg'], inplace=True)

There is some discussion of using rioxarray behind the scenes in a future version of intake-stac (https://github.com/intake/intake-stac/pull/75)

matthewhanson commented 3 years ago

@g2giovanni I'm also noticing you are using the 'sentinel-s2-l2a' collection rather than 'sentinel-s2-l2a-cogs' which is probably what you want since they are COGs rather than jpeg2k and not in a requester pays bucket

Also, have a look at pystac-client which is replacing sat-search

scottyhq commented 3 years ago

@g2giovanni I'm also noticing you are using the 'sentinel-s2-l2a' collection rather than 'sentinel-s2-l2a-cogs' which is probably what you want since they are COGs rather than jpeg2k and not in a requester pays bucket

@matthewhanson seems that not all L2a data is available as COGs. I actually tried this when looking into the issue here:

import intake
from pystac_client import Client # https://github.com/stac-utils/pystac-client

aoi = {'type': 'Polygon',
 'coordinates': [[[9.59, 44.99],
   [9.82, 44.99],
   [9.82, 45.11],
   [9.60, 45.11],
   [9.59, 44.99]]]}

catalog = Client.open("https://earth-search.aws.element84.com/v0")
results = catalog.search(#collections=['sentinel-s2-l2a'], #2 found
                         collections=['sentinel-s2-l2a-cogs'], #0 found
                         intersects=aoi,
                         datetime='2018-01-01/2018-01-03',
                         )

print(f"{results.matched()} items found")
RichardScottOZ commented 3 years ago

Definitely some things missing here and there.

g2giovanni commented 3 years ago

@scottyhq you're right. It's the reason why I use the "sentinel-s2-l2a" for older years. Anyway, I think I found the problem: some geotiff file is not georeferenced properly in the collection. Actually, I bypassed the problem, georeferencing images with this kind of problem. Maybe could those files be fixed?

scottyhq commented 3 years ago

I think I found the problem: some geotiff file is not georeferenced properly in the collection. Actually, I bypassed the problem, georeferencing images with this kind of problem. Maybe could those files be fixed?

@g2giovanni if there is a metadata issue with the sentinel-s2-l2a-cogs you can open an issue here describing in detail https://github.com/cirrus-geo/cirrus-earth-search