spatialmodel / inmap

InMAP reduced-form air quality model for fine particulate matter (PM2.5)
GNU General Public License v3.0
59 stars 41 forks source link

Ignore transient errors when checking status #97

Closed navravi closed 3 years ago

navravi commented 3 years ago

When polling the job status, there are occasionally errors like the following (observed on Windows and macOS):

b'rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 35.232.224.150:443: connectex: No connection could be made because the target machine actively refused it."\n'
b'rpc error: code = Unavailable desc = transport is closing\n'
b'rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: read tcp 10.27.37.17:53842->35.232.224.150:443: wsarecv: An existing connection was forcibly closed by the remote host."\n'

Ignoring these transient errors allows us to eventually get the simulation output, rather than giving up and abandoning the job. This is especially helpful for longer simulations where the chance of one of the status calls failing is higher.

When starting the job, however, it makes sense to propagate the CalledProcessError – as is done when retrieving the output and deleting the job – instead of returning None.

Tested by re-running the Jupyter notebook accompanying the blog post (sr_example.ipynb).

Please take a look.

coveralls commented 3 years ago

Coverage Status

Coverage remained the same at 68.935% when pulling cd11f38521b158f6947a3dbcd1de39397ea75712 on navravi:patch-1 into ee84ff451b79e26405e00128457cb6d0f15be5a3 on spatialmodel:master.

coveralls commented 3 years ago

Coverage Status

Coverage remained the same at 68.935% when pulling cd11f38521b158f6947a3dbcd1de39397ea75712 on navravi:patch-1 into ee84ff451b79e26405e00128457cb6d0f15be5a3 on spatialmodel:master.

coveralls commented 3 years ago

Coverage Status

Coverage remained the same at 68.935% when pulling cd11f38521b158f6947a3dbcd1de39397ea75712 on navravi:patch-1 into ee84ff451b79e26405e00128457cb6d0f15be5a3 on spatialmodel:master.