func statusFromError(err error) Status {
switch {
case err == nil:
return StatusDone
case errors.Is(err, context.Canceled):
return StatusStopped
case errors.Is(err, context.DeadlineExceeded):
return StatusWaiting
default:
return StatusError
}
}
This means that if task ended with the following error:
"get repair target: create repair plan: calculate max host intensity: 172.19.96.244: get total memory: context deadline exceeded"
SM would mistake this error for going out of maintenance window. This results in incorrect task status, but also incorrect rescheduling of this task.
In order to fix that, SM shouldn't rely on general context errors (via WithDeadline), but it should check only for SM specific errors (via WithDeadlineCause).
Discovered in https://github.com/scylladb/scylla-enterprise/issues/4285, scheduler checks if task ended with error/pause/going out of maintenance window by matching returned error:
This means that if task ended with the following error:
SM would mistake this error for going out of maintenance window. This results in incorrect task status, but also incorrect rescheduling of this task.
In order to fix that, SM shouldn't rely on general context errors (via
WithDeadline
), but it should check only for SM specific errors (viaWithDeadlineCause
).cc: @karol-kokoszka