Closed rosepearson closed 10 months ago
Hi, here are my notes about what I have tried so far, in order to improve the behavior of the dask part on the example tile (i.e. getting good cpu utilisation, little transfer overhead reported in the dashboard and no worker timeout errors).
Trying without work stealing (https://distributed.dask.org/en/latest/work-stealing.html)
DASK_DISTRIBUTED__SCHEDULER__WORK_STEALING="False" python -m geofabrics --instruction instruction_lidar.json
Trying to change priority (https://distributed.dask.org/en/stable/priority.html)
with dask.annotate(priority=10):
dem["z"] = dem.z.where(mask , 0)
Trying to pre-scatter the mask to clip dem in add_lidar
(for foreshore)
from distributed.client import default_client
mask = client.scatter(mask)
Trying to get more reliable comms
DASK_DISTRIBUTED__COMM__RETRY__COUNT=3 python -m geofabrics --instruction instruction_lidar.json
Trying to remove all clipping/masking before self.raw_dem.save_dem(cached_file, ...)
Testing with different masking in add_lidar
(just testing up to self.raw_dem.save_dem(cached_file, ...)
and still without most rio.clip)
xarray.ones_like(dem.z, dtype=bool) | (dem.z <= 0)
(dem.z <= 0) | mask
and mask not lazy
(dem.z <= 0) | mask
and broadcasted mask
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
(dem.z <= 0) | mask
and dask.block
mask = ~clip_mask(dem.z, buffered_foreshore.geometry, None)
arrs = [] for i in range(0, mask.shape[0], self.chunk_size): sub_arrs = [] for j in range(0, mask.shape[1], self.chunk_size): chunk = mask[i : i + self.chunk_size, j : j + self.chunk_size].copy() sub_arrs.append(chunk) arrs.append(sub_arrs) mask = dask.array.block(arrs)
- transfer overhead, timeout messages
- masking all dem components / clip_mask without compute
- some timeouts and transfer overhead, not too bad
But adding clip_mask back everywhere, transfer is bad (significant amount vs. compute) and timeout errors for dask workers :(.
TODO export DASK__DISTRIBUTED__COMM__TIMEOUTS__CONNECT="60s"
TODO check scheduling https://distributed.dask.org/en/latest/scheduling-policies.html
@jennan refactor to try address Dask failures
Specifically: Create mask arrays instead of repeatably clipping against geometries. Revisit the save/load Add a no_values_mask property