Closed kapadia closed 10 months ago
I was able to open this Zarr file successfully using Xarray, with no errors. I'm on Zarr v2.14.2 and Xarray v2023.2.0.
<xarray.Dataset>
Dimensions: (Y: 718, X: 791)
Coordinates:
* X (X) float64 1.021e+05 1.024e+05 1.027e+05 ... 3.389e+05 3.392e+05
* Y (Y) float64 2.827e+06 2.826e+06 2.826e+06 ... 2.612e+06 2.612e+06
Data variables:
Band1 (Y, X) float32 ...
Band2 (Y, X) float32 ...
Band3 (Y, X) float32 ...
After discussion around this example, here is the code used to generate the example file:
import requests
import pandas as pd
import numpy as np
import dask.array as da
import dask
from dask.distributed import Client
import zarr
from numcodecs import Zstd
import xarray as xr
import rasterio as rio
from rasterio.windows import Window
def read_raster(uri, window):
with rio.open(uri) as src:
arr = src.read(window=window)
return arr
def read_stac_item(uri):
r = requests.get(uri)
feature = r.json()
datetime = pd.to_datetime(feature['properties']['datetime'])
asset = feature['assets']['sr']
bands = [item['common_name'] for item in asset['eo:bands']]
band_count = len(bands)
image_uri = asset['href']
height = 256
width = 256
dtype = np.int16
window = Window(col_off=0, row_off=0, width=width, height=height)
image_arr = da.from_delayed(
dask.delayed(read_raster)(image_uri, window),
shape=(band_count, height, width),
dtype=dtype)
return xr.DataArray(image_arr[None, :],
coords=dict(
datetime=[datetime],
band=bands
),
dims=["datetime", "band", "row", "col"],
)
def main():
client = Client(processes=False)
base_url = "https://www.planet.com/data/stac/fusion/14N/29E-188N"
collection_url = f"{base_url}/collection.json"
r = requests.get(collection_url)
collection = r.json()
item_uris = [
f"{base_url}/{link['href']}" for link in collection['links'] if link['rel'] == 'item']
# Probe one image to get CRS
r_item = requests.get(item_uris[0])
image_uri = r_item.json()['assets']['sr']['href']
with rio.open(image_uri) as src:
crs = src.crs.to_wkt()
transform = list(src.transform)[0:6]
futures = client.map(read_stac_item, item_uris)
arrays = client.gather(futures)
ds = xr.concat(arrays, dim='datetime', combine_attrs="identical").chunk(
datetime=-1, band=-1, row=32, col=32)
ds['datetime'] = pd.to_datetime(ds.datetime)
ds.attrs = {
**ds.attrs,
"_CRS": crs,
"_TRANSFORM": transform
}
dataset = xr.Dataset({
"sr": ds
})
destination = "planet-fusion.zarr"
dataset.to_zarr(
destination,
encoding={
variable: {
"compressor": zarr.Blosc(cname="zstd", clevel=3, shuffle=2)
} for variable in dataset
})
main()
A new file is available that uses the proposal in #19.
_TRANSFORM
to _GeoTransform
https://storage.googleapis.com/pl-amit-public/geozarr/planet-fusion-20230510.zarr.zip
In [1]: import xarray as xr
In [2]: ds = xr.open_zarr('planet-fusion.zarr')
In [3]: ds
Out[3]:
<xarray.Dataset>
Dimensions: (band: 4, datetime: 365, row: 256, col: 256)
Coordinates:
* band (band) <U5 'blue' 'green' 'red' 'nir'
* datetime (datetime) datetime64[ns] 2021-01-01 2021-01-02 ... 2021-12-31
Dimensions without coordinates: row, col
Data variables:
sr (datetime, band, row, col) int16 dask.array<chunksize=(365, 4, 32, 32), meta=np.ndarray>
In [4]: ds.sr
Out[4]:
<xarray.DataArray 'sr' (datetime: 365, band: 4, row: 256, col: 256)>
dask.array<open_dataset-78e64952e25134fc0fcda392329a7e9bsr, shape=(365, 4, 256, 256), dtype=int16, chunksize=(365, 4, 32, 32), chunktype=numpy.ndarray>
Coordinates:
* band (band) <U5 'blue' 'green' 'red' 'nir'
* datetime (datetime) datetime64[ns] 2021-01-01 2021-01-02 ... 2021-12-31
Dimensions without coordinates: row, col
Attributes:
GeoTransform: [3.0, 0.0, 696000.0, 0.0, -3.0, 4536000.0]
_CRS: PROJCS["WGS 84 / UTM zone 14N",GEOGCS["WGS 84",DATUM["WGS_...
In [5]: xr.__version__
Out[5]: '2023.4.2'
cc. @briannapagan
Per request from our last meeting, I’m sharing a Zarr file containing spatiotemporal observations of remote sensing imagery. Following GDAL's Zarr documentation, this file contains a _CRS attribute in WKT format. Additionally, an affine transform is also appended allowing for pixel to world coordinate conversions. Following GDAL’s current behavior, these attributes are stored on the Zarr Array rather than the Zarr Group.
https://storage.googleapis.com/pl-amit-public/geozarr/planet-fusion.zarr.zip
Two issues were observed in this process:
If you'd like to observe the second issue, it can be reproduced using the following commands.
I hope the file and these observations are useful to this group.
Edit:
This is the metadata associated with each array in the file above