Closed rosepearson closed 1 year ago
Notes - I quickly looked into doing this and had some odd behaviour when it seemed to load some of the LiDAR files many times and took much longer than the explicit compute approach.
INFO:root:The output the coordinate system EPSG values of {'horizontal': 2193, 'vertical': 7839} will be used. If these are not as expected. Check both the 'horizontal' and 'vertical' values are specified.
INFO:root:Downloading vector layers [51153] from the linz dataservice
WARNING:fiona._env:One or several characters couldn't be converted correctly from UTF-8 to ISO-8859-1. This warning will not be emitted anymore.
INFO:root:The LiDAR dataset Wellington_2013 is assumed to have the source coordinate system EPSG: {'horizontal': 2193, 'vertical': 7839} as defined in the instruction file
INFO:root:Preparing [2, 2] chunks
INFO:root: Chunk [0, 0]
INFO:root: Chunk [0, 1]
INFO:root: Chunk [1, 0]
INFO:root: Chunk [1, 1]
INFO:root:Reading all 6 files in chunk.
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089041.laz
INFO:root:Reading all 8 files in chunk.
INFO:root:Reading all 4 files in chunk.
INFO:root:Reading all 6 files in chunk.
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089041.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091041.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_088041.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_088038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_088039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090041.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_088040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_088040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091039.laz
INFO:root:Incorporting Bathymetry: ['C:/Users/pearsonra/Documents/data/Bathymetry/Waikanae/lds-depth-contour-polyline-hydro-190k-1350k-SHP.zip!depth-contour-polyline-hydro-190k-1350k.shp']
INFO:root:Reading all 8 files in chunk.
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091039.laz
INFO:root:Reading all 8 files in chunk.
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091039.laz
INFO:root:Reading all 8 files in chunk.
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091039.laz
INFO:root:Reading all 8 files in chunk.
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091039.laz
INFO:root:Creating offshore interpolant
INFO:root:Reading all 8 files in chunk.
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091039.laz
INFO:root:Reading all 8 files in chunk.
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090038.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_089039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090040.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_090039.laz
INFO:root: Loading in file ot_CL1_WLG_2013_1km_091039.laz
@jennan has agreed to provide guidance on this issue. Depending on what we find there are various possibilities on how to proceed.
The first things to do, however, are:
Caught up with @jennan, who showed off the powers of the Dask profiler. Tasks for @rosepearson to do before the next meeting:
Based on the large problem we may change the Dask workers configuration and then address lazy compute.
Weirdly I get the following error when trying to load the Miniconda3 module
Hi @rosepearson, on wsg001 (and any Maui ancil. node), you need to load the NeSI module after module purge
, because this one unload everything (it doesn't unload NeSI module on Mahuika):
module purge
module load NeSI
module load Miniconda3
Thanks @jennan - I wonder if you might want to update the documentation on https://support.nesi.org.nz/hc/en-gb/articles/360001580415-Miniconda3 with a note that you may need to load the NeSI module if you are on Maui ancil (unless access to these is limited to NIWA only... which I'm not thinking it might be).
@rosepearson this node is limited to NIWA indeed.
@rosepearson Actually, this is good to mention for NeSI ancil node (which wsg001 is not ;-)), you are right. I'll update the documentation.
Looking at running a large example over Wellington
First a medium example for the same data and configuration - just over a smaller region This used a chunk size of 100, and a memory limit of 10GiB
Over the large example with the same data and configuration - a larger region I get stuck with an unresponsive dashboard. i wonder if this is because we still have explicit compute so perhaps it is trying to allocate too much space in memory for the full DEM? I also get the following error heaps used chunk sizes of:
@rosepearson it is likely an issue with the number of Dask tasks. If so... you can try to
Finally got a crash @jennan on the slurm job. I've copied the text below for your interest (from the SLURM job out file). I'll comment where when I have a repo for you to use for you to run the code yourself.
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,662 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,663 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,665 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,665 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,666 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,666 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,667 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,668 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,669 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,670 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,670 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,672 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,673 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2022-10-18 11:57:46,673 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 595, in close
await self.kill(timeout=timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 386, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/nanny.py", line 819, in kill
await process.join(max(0, deadline - time()))
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/process.py", line 316, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
Traceback (most recent call last):
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/scale_wlg_persistent/filesets/project/niwa03440/geofabrics/GeoFabrics/src/main.py", line 191, in <module>
main()
File "/scale_wlg_persistent/filesets/project/niwa03440/geofabrics/GeoFabrics/src/main.py", line 187, in main
launch_processor(args)
File "/scale_wlg_persistent/filesets/project/niwa03440/geofabrics/GeoFabrics/src/main.py", line 162, in launch_processor
run_processor_class(
File "/scale_wlg_persistent/filesets/project/niwa03440/geofabrics/GeoFabrics/src/main.py", line 117, in run_processor_class
runner.run()
File "/scale_wlg_persistent/filesets/project/niwa03440/geofabrics/GeoFabrics/src/geofabrics/processor.py", line 578, in run
self.raw_dem.add_lidar(
File "/scale_wlg_persistent/filesets/project/niwa03440/geofabrics/GeoFabrics/src/geofabrics/dem.py", line 1327, in add_lidar
dem = self._add_tiled_lidar_chunked(
File "/scale_wlg_persistent/filesets/project/niwa03440/geofabrics/GeoFabrics/src/geofabrics/dem.py", line 1414, in _add_tiled_lidar_chunked
#chunked_dem = chunked_dem.compute()
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/xarray/core/dataset.py", line 901, in compute
return new.load(**kwargs)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/xarray/core/dataset.py", line 735, in load
evaluated_data = da.compute(*lazy_data.values(), **kwargs)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/dask/base.py", line 600, in compute
results = schedule(dsk, keys, **kwargs)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/client.py", line 3057, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/client.py", line 2226, in gather
return self.sync(
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/utils.py", line 339, in sync
return sync(
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/utils.py", line 406, in sync
raise exc.with_traceback(tb)
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/utils.py", line 379, in f
result = yield future
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/nesi/project/niwa03440/conda/envs/geofabrics/lib/python3.10/site-packages/distributed/client.py", line 2089, in _gather
raise exception.with_traceback(traceback)
distributed.scheduler.KilledWorker: Attempted to run task ('elevation_over_chunk-from-value-load_tiles_in_chunk-concatenate-8516fa3f6ef560e5bd6cf02c82d18ec1', 20, 31) on 3 different workers, but all those workers died while running it. The last worker that attempt to run the task was tcp://127.0.0.1:34046. Inspecting worker logs is often a good next step to diagnose what went wrong. For more information see https://distributed.dask.org/en/stable/killed.html.
<Client: 'tcp://127.0.0.1:45727' processes=20 threads=20, memory=400.00 GiB>
Just a note on changes I'm stashing for now -
Readme for setting up GeoFabrics on the HPC for testing @jennan: readme
Currently compute is called in the dem module after the chunked dem is created. See code screen capture. We could reduce memory load by not calling compute and instead calling dense_dem.to_netcdf(...). This would mean instead of all of the dense_dem contents being loaded into memory in one go, chunks could be loaded, processed and saved individually reducing the overall memory footprint (particularly important for large catchments for fine resolutions).
Considerations - saving the dense DEM before generating the offshore DEM values.
Areas to adderss from @rosepearson: