opendatacube / odc-stats

Statistician is a framework of tools for generating statistical summaries of large collections of EO data managed in an ODC instance.
Apache License 2.0
9 stars 4 forks source link

misplaced chunk in geomedian with dask #100

Closed emmaai closed 9 months ago

emmaai commented 11 months ago

"Misplaced" chunks in geomedian as shown in the picture below. My current theory is that the graph unpack is triggered by persist before the graph built is completed, then something goes wrong in the middle. It's either a bug in dask, or we shouldn't do this at all. It seems to happen randomly. I don't have a reliable way to reproduce it. misplaced_chunk

robbibt commented 11 months ago

Hmm, a user recently emailed us about a kind of similar looking issue:

image

They're not doing geomedians, but are doing large-scale medians/means/stdev using Dask. The previous assumption was that this was caused by the S3 random data access problem we've talked about previously on Teams, but I wonder if it might actually be related to this issue too....

(or maybe it's completely different, hard to tell from those screenshots)

emmaai commented 9 months ago

can confirm none of above fixed the issue, it happened again in 2015 geomedian test processing. http://dea-public-data-dev.s3-website-ap-southeast-2.amazonaws.com/?prefix=test/gm-ls8-dilation-6-cloud-opening-5-v2/3-0-0/x40/y13/2015--P1Y/

emmaai commented 9 months ago

fixed by https://github.com/opendatacube/odc-algo/pull/4

SpacemanPaul commented 9 months ago

Thanks for your work on this @emmaai ! I know this was a tough one.