Closed benjimin closed 2 years ago
datacube_wps/processes/witprocess.py:60: in process_data
re_wit = cal_area(aggregated)
datacube_wps/processes/witprocess.py:212: in cal_area
re = pd.merge(re, ((aggregated.TCW > wet_threshold).astype('int')
/env/lib/python3.8/site-packages/xarray/core/dataarray.py:929: in load
ds = self._to_temp_dataset().load(**kwargs)
/env/lib/python3.8/site-packages/xarray/core/dataset.py:865: in load
evaluated_data = da.compute(*lazy_data.values(), **kwargs)
/env/lib/python3.8/site-packages/dask/base.py:570: in compute
results = schedule(dsk, keys, **kwargs)
/env/lib/python3.8/site-packages/dask/threaded.py:79: in get
results = get_async(
/env/lib/python3.8/site-packages/dask/local.py:517: in get_async
raise_exception(exc, tb)
...
/env/lib/python3.8/site-packages/dask/core.py:122: in _execute_task
elif not ishashable(arg):
/env/lib/python3.8/site-packages/dask/core.py:20: in ishashable
hash(x)
/env/lib/python3.8/site-packages/datacube/utils/geometry/_base.py:1069: in __hash__
return hash((*self.shape, self.crs, self.affine))
/env/lib/python3.8/site-packages/datacube/utils/geometry/_base.py:259: in __hash__
return hash(self.to_wkt())
/env/lib/python3.8/site-packages/datacube/utils/geometry/_base.py:190: in to_wkt
return self._crs.to_wkt(pretty=pretty, version=version)
pyproj/_crs.pyx:457: in pyproj._crs.Base.to_wkt
pyproj/_crs.pyx:120: in pyproj._crs._to_wkt
pyproj/_crs.pyx:24: in pyproj._crs.cstrdecode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 3: invalid start byte
Intermittent and definitely not deterministic. On different occurrences, the same error relates to a slightly different byte sequence.
cstr = b' \xf16\\\t\x7f'
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 1: invalid continuation byte
I'm not sure if this could be because of some kind of thread race in what dask is doing, or if it simply means there is a memory error (like a buffer overflow or similar corruption).
I’ve not looked at the code but proj CRS are not thread safe and you error looks very much like that problem - you can’t create CRS objects and then pass them into a Dask worker. You will need to modify the code to create the CRS object IN the thread using it. I usually pass the CRS epsg code in and then create the CRS in the code running on the worker.
It seems this was a known symptom of pyproj
not being threadsafe (and had impacted other dask applications).
It looks like this was supposed to be mostly fixed in pyproj 3.1.0 earlier this year.
...and looks like the build was using pyproj 2.6.1.
Now, how to verify a fix of an intermittent problem...
I estimate the fault was previously occurring in around 10% of builds. So could probably force it to occur, by wrapping the pytest invocation in a bash for loop (say try 50 repeats), if further investigation warranted.
Tentatively closing as an upstream issue; build is currently using pyproj 3.2.1.
Intermittent build test failure, where
test_api.py::test_wit
elicits a UnicodeDecodeError (invalid start bytes for utf8) frompyproj
.The trace involves witprocess
cal_area
, dask core, and datacube geometry__hash__
andto_wkt
.Previously noted in https://github.com/opendatacube/datacube-wps/issues/122#issuecomment-921424699