Closed alexgleith closed 3 months ago
@alexgleith thanks for this, can you mention versions of dask/distributed in your setup, when running this on planetary computer I get no errors and imagery looks fine
Dask versions:
dask==2023.9.3
dask-glm==0.3.0
dask-image==2023.8.1
dask-ml==2023.3.24
Also: imagecodecs==2023.9.18
and tifffile==2023.9.26
.
Can't reproduce 😦 @alexgleith,
Code I'm running:
import os
from dask.distributed import Client as DaskClient
from odc.geo.cog import save_cog_with_dask
from odc.stac import configure_rio, load
from pystac_client import Client
def main(dask_client):
tk = os.environ.get("EARTHDATA_TOKEN", None)
if tk is None:
with open("safe/ers.tk", "rt") as f:
tk = f.read().strip()
catalog = "https://cmr.earthdata.nasa.gov/cloudstac/LPCLOUD/"
# Searching across both landsat and sentinel at 30 m
collections = ["HLSS30.v2.0", "HLSL30.v2.0"]
client = Client.open(catalog)
# BBOX over Precipitous Bluff in Tasmania
ll = (-43.55, 146.45)
ur = (-43.35, 146.75)
bbox = [ll[1], ll[0], ur[1], ur[0]]
print("Querying STAC catalog")
# Search for items in the collection
items = client.search(
collections=collections,
bbox=bbox,
datetime="2023-07-01/2023-09-30",
).item_collection()
# Finds about 109 items
print(f"Found {len(items)} items")
rgb_bands = ("B04", "B03", "B02")
# Configure GDAL. You need to export your earthdata token as an environment variable.
configure_rio(cloud_defaults=True, GDAL_HTTP_HEADERS=f"Authorization: Bearer {tk}")
data = load(
items,
bbox=bbox,
crs=6933,
resolution=30,
chunks={"x": 2500, "y": 2500, "time": 1},
groupby="solar_day",
bands=[*rgb_bands, "Fmask"],
)
mask_bitfields = [1]
bitmask = 0
for field in mask_bitfields:
bitmask |= 1 << field
# Get cloud mask
cloud_mask = (data["Fmask"].astype(int) & bitmask) != 0
# Contract and then expand the cloud mask to remove small areas
if True:
from odc.algo import mask_cleanup
is_cloud = mask_cleanup(cloud_mask, [("opening", 2), ("dilation", 3)])
else:
is_cloud = cloud_mask
masked = data.where(~is_cloud)
masked = masked.drop("Fmask")
# Create a simple cloud-free median, still in memory
median = masked.median("time")
print(f"Running: {dask_client.dashboard_link}")
path = save_cog_with_dask(
median.odc.to_rgba(rgb_bands, vmin=0, vmax=1000),
"rgba.tif",
compression="webp",
level=100,
client=dask_client,
).compute()
print(f"Saved to: {path}")
return path
if __name__ == "__main__":
print("Starting Dask")
dask_client = DaskClient(n_workers=1, threads_per_worker=None)
print(f"Dask Dashboard: {dask_client.dashboard_link}")
print("Starting main")
main(dask_client)
In this conda environment:
name: odc-geo-cog-test
channels:
- conda-forge
dependencies:
- python =3.10
# for new odc-geo
- tifffile
- imagecodecs
- pyproj
- rasterio
- shapely
- dask
- xarray
- cachetools
# for the test notebook
- pystac-client
- datacube
- odc-stac
- odc-algo
- pip:
- odc-geo==0.4.2rc1
conda env list --explicit
below
@alexgleith there is a new release candidate that fixes Dask key aliasing, I doubt that was the problem in your case, but who knows. If you could try your use-case in a "clean, from scratch" environment, that would be great.
As mentioned, I used odc-stac to load data and create a median, then used the new dask function to write to S3.
File wrote fine, but only contains zeros.
ODC-related libraries are as follows:
Code I used is below.