Open inglesp opened 7 months ago
The only place we look up .name
on a span in our own code is here:
I think this is probably caused by the prepare_for_reboot
script not setting up tracing, meaning that we don't have a real tracer object. The jobrunner service does this by calling jobrunner.tracing.setup_default_tracing
: https://github.com/opensafely-core/job-runner/blob/22f9fd5eb25280061d386178304d8de9e0174f83/jobrunner/service.py#L33
We should ensure that tracing is set up by this script (and any others), and we should consider being defensive against it not being set up, perhaps by writing a wrapper for get_tracer
.
I ran
just jobrunner/stop
and thenjust jobrunner/prepare-for-reboot
at the start of the maintenance window for https://github.com/opensafely-core/sysadmin/issues/168.Several tracebacks were logged to the screen. Unfortunately I didn't capture them before the server was rebooted, and so I do not have a complete record.
As far as I could tell, there was one traceback per job. The tracebacks were caught and logged from
finish_current_job
: https://github.com/opensafely-core/job-runner/blob/22f9fd5eb25280061d386178304d8de9e0174f83/jobrunner/tracing.py#L117-L130And the exception message was:
AttributeError: 'NonRecordingSpan' object has no attribute 'name'
.However I don't have a record of where the exception was raised from.
As far as I can tell, the logs do not indicate a problem with the stopping the job or changing the state, but only that the change of state could not be traced.