jcus0006 / mtdcovabm

Distributed Covid 19 Agent Based Model modelled on Maltese data.
0 stars 0 forks source link

Dask Worker error #8

Open jcus0006 opened 11 months ago

jcus0006 commented 11 months ago

image

The error seems to be pertaining to future.result() call. It seems to be crashing when trying to evaluate the result, possibly because the worker would have crashed. However, the error seems to indicate that the worker tried to report the error remotely, and didn't find the log file path to log into.

Some ideas:

  1. Create the log path on each worker and then manually look at the Logs folder on each node (this might not be a problem when testing with 2 machines but would become unfeasible with tens of nodes)
  2. Look at the Worker logs while running (these may uncover some issues that were previously not outright visible)
  3. Come up with some way to gather the logs back on the scheduler node (this seems like the best option)
jcus0006 commented 11 months ago

Tried 2. 500 Internal Server error is returned when trying to access the logs of the remote worker node. This is likely because the Dask configuration on that machine is not the latest (e.g. Bokeh). Perhaps re-generating the Linux image would make more sense rather than using the existing one.