rosepearson / GeoFabrics

A package for generating hydrologically conditioned DEMs and roughness maps from LiDAR and other infrastructure data. Check the wiki for install and usage instructions, and documentation at https://rosepearson.github.io/GeoFabrics/
GNU General Public License v3.0
25 stars 11 forks source link

Refactor rawdem #224

Closed rosepearson closed 6 months ago

rosepearson commented 7 months ago

@jennan refactor to try address Dask failures

Specifically: Create mask arrays instead of repeatably clipping against geometries. Revisit the save/load Add a no_values_mask property

jennan commented 7 months ago

Hi, here are my notes about what I have tried so far, in order to improve the behavior of the dask part on the example tile (i.e. getting good cpu utilisation, little transfer overhead reported in the dashboard and no worker timeout errors).

Trying without work stealing (https://distributed.dask.org/en/latest/work-stealing.html)

DASK_DISTRIBUTED__SCHEDULER__WORK_STEALING="False" python -m geofabrics --instruction instruction_lidar.json

Trying to change priority (https://distributed.dask.org/en/stable/priority.html)

with dask.annotate(priority=10):
    dem["z"] = dem.z.where(mask , 0)

Trying to pre-scatter the mask to clip dem in add_lidar (for foreshore)

from distributed.client import default_client
mask = client.scatter(mask)

Trying to get more reliable comms

DASK_DISTRIBUTED__COMM__RETRY__COUNT=3 python -m geofabrics --instruction instruction_lidar.json

Trying to remove all clipping/masking before self.raw_dem.save_dem(cached_file, ...)

Testing with different masking in add_lidar (just testing up to self.raw_dem.save_dem(cached_file, ...) and still without most rio.clip)

arrs = [] for i in range(0, mask.shape[0], self.chunk_size): sub_arrs = [] for j in range(0, mask.shape[1], self.chunk_size): chunk = mask[i : i + self.chunk_size, j : j + self.chunk_size].copy() sub_arrs.append(chunk) arrs.append(sub_arrs) mask = dask.array.block(arrs)


  - transfer overhead, timeout messages
- masking all dem components / clip_mask without compute
  - some timeouts and transfer overhead, not too bad

But adding clip_mask back everywhere, transfer is bad (significant amount vs. compute) and timeout errors for dask workers :(.

TODO export DASK__DISTRIBUTED__COMM__TIMEOUTS__CONNECT="60s"
TODO check scheduling https://distributed.dask.org/en/latest/scheduling-policies.html