Preventing patients being "lost" / race condition

thigger commented 5 years ago

I'm modelling a hospital, and patients must always have at least one ward bed seized (they will briefly have two when changing). If a ward bed is closed (set_capacity) then the patient must not fall into a queue (so queue_size_strict is used). However a patient should be able to queue for a bed once they already have one (for example moving out of the emergency dept into a ward).

The suggestion was to have a queue size of zero, and use a reject trajectory on the seize to increase the queue size and retry the seize.

However, this results in a race condition whereby a second patient can enter the queue in the time between the successful seize and the queue size being decreased again.

The approach I am currently using is to add a flag to control queueing behaviour with the seize (PR #198). This won't solve other possible races - eg if a ward closure causes a patient to drop into a queue briefly, but in that case when the queue is decreased they will go via handle_unfinished in the same way that they would have if the queue size had been decremented already.

An alternative may be creative use of handle_unfinished, but willqueue seems to be working well for me at present - and I'm happy to tweak as required.

I've knocked up an example - I'm sure it could be made smaller but at least it's reproducible.

race condition test - Copy.R.txt

Enchufa2 commented 5 years ago

Does it help if you increase the priority (using set_prioritization) in the reject trajectory?

thigger commented 5 years ago

Adding:

set_prioritization(c(1,-1,-1)) %>%

to the start of the reject trajectory in the example, makes no difference.

Does prioritization affect execution order as well as queuing?

Enchufa2 commented 5 years ago

Prioritization only affects queueing. Sorry, I didn't have the time to have a look at the code yet, and I misread your issue here (I thought it was pre-seize, but the race condition actually is post-seize).

Anyways, the main problem here is that you have two simultaneous events here, am I right?

I don't like the willqueue solution very much. It seems to hacky to me, because it solves a very limited issue, and not others, as you said.

thigger commented 5 years ago

I've become quite a fan of willqueue (though honestly mostly because it fixes my issue!). My reason being that I think it's generally useful to be able to control queueing behaviour for each arrival, rather than by resource (this is what I was originally after when you suggested the workaround of increasing queue size and retrying the seize). Would it be preferable to have an activity that turns on and off queueing for an arrival? (similar to renege_in)

In terms of race conditions, I can see a few:

(scenario)

Arrival A initial seize (fails)
Arrival A increments queue size
Arrival A seize (successful after queueing)
Arrival A decrements queue size

Arrival B enters queue between 3 and 4, and is dropped by 4. Solved by "willqueue"
Arrival B enters queue when it is pre-empted out of a bed (or bed closure) between 3 and 4. Not solved, but results in B being dropped at 4 rather than immediately as should have happened, so no real problem
Arrival B enters queue between 2 and 3, resulting in A being rejected. Solved by "willqueue"
Arrival B enters queue when it is pre-empted out of a bed (or bed closure) between 2 and 3. Results in A being rejected and therefore dropped, and B stuck in the queue. This is the worst-case, isn't solved by willqueue and results in the queue size being permanently increased by 1.

That last race condition is a real potential problem, and I'm unsure exactly how to fix it. I'm not that familiar with r-simmer's scheduling yet - though I presume it might be quite complex to solve race conditions entirely by making sure the multitasking is more pre-emptive in nature? (ie an arrival continues uninterrupted until it hits something that causes it to pause eg a timeout, seize, or wait)

thigger commented 5 years ago

(obviously there are various fixes involving abuses of handle_unfinished() - but they feel potentially quite hacky, and with the last example, arrival B is stuck in the queue anyway)

Enchufa2 commented 5 years ago

What if we add an argument queue_min_priority to add_resource so that only arrivals with a minimum priority value can be enqueued (otherwise, and if there's no server available, rejected)?

thigger commented 5 years ago

I think this would work well and cover most cases. Of my 4 potential race conditions (the bullet points above, not numbered).

I'm assuming that if the arrival's priority is below min_priority then it won't queue during a seize activity, or when pre-empted/dropped due to reduced capacity.

a) solved as B can have a low priority and not enter the queue b) solved in most circumstances. If B queued to get its bed then its priority may still be high briefly so it could enter the queue. However still no real problem as it will be dropped when queue is decremented. c) solved as B will not enter the queue d) mostly solved. I can see a set of conditions where it might still occur which I'll describe below - however it has to be ridiculously rare (and you might say impossible if the way R-simmer handles multitasking doesn't allow it):

Arrival A: Initial seize (fails) Increase priority to allow queueing Increment queue RACE EVENT Seize <should succeed after queueing, but fails as B has dropped into it> (Decrement queue - never reached) (Decrease priority to prevent queueing - never reached)

Arrival B: Initial seize (fails) Increase priority to allow queueing Increment queue Seize (succeeds) Decrement queue RACE EVENT <Capacity decreases here and B is dropped, lands in the queue which A has just increased> Drop priority to prevent queueing

I appreciate this requires 3 separate trajectories to align perfectly to make it happen, so might be near impossible to actually occur!

It may be that switching the order of the queue size change and the priority change could cure this too.

Overall I think your idea is better than mine; it covers more cases. Are there any potential side-effects of using priority in this way? What if someone wants an arrival to have a high priority for pre-emption but not to enter queues? (or is this a vanishingly rare use-case? I'm pretty sure it doesn't affect me!)

Enchufa2 commented 5 years ago

The thing is that, with this change, there's no need to increase and decrease the queue size anymore. You just set a queue size and a minimum priority and then you play with the arrivals' priority value.

Even if there's a minimum priority to enter the queue, if there's room in the server, a low-priority arrival will be served. Then, if preemption occurs, if queue_size_strict=FALSE, then the low-priority arrival goes to a special queue to be handled later; but if queue_size_strict=TRUE, then the low-priority arrival is dropped.

Enchufa2 commented 5 years ago

Are there any potential side-effects of using priority in this way? What if someone wants an arrival to have a high priority for pre-emption but not to enter queues?

Mmmmh... There's a potential use case there. Having a maximum priority would solve it. We could add queue_max_priority, or maybe replace both arguments with something like queue_priority, defining a range and by default c(0, Inf)?

thigger commented 5 years ago

The thing is that, with this change, there's no need to increase and decrease the queue size anymore. You just set a queue size and a minimum priority and then you play with the arrivals' priority value.

Sorry - getting stuck in my thinking! So long as the behaviour also affects queuing when an arrival is pre-empted or ejected due to capacity drops then I can happily leave all my ward queues at Inf.

I did spot one potential race condition with the current fix though (you may say it's not possible, but I don't know the multitasking details), however it's probably of limited importance and may even be desirable behaviour. It's presumably possible for another trajectory to reduce the capacity after an arrival has successfully seize()d the resource but before it has reduced the priority to prevent queuing. The arrival will presumably then re-enter the queue. However, this seems appropriate as it presumably will just look like it queued a bit longer. For my own use case I just need to make sure I reduce the priority before releasing the old ward.

The other thing I need to look into is what happens if a patient is queuing for a new ward and is pre-empted from the old one. As the priority will have been raised in order to allow queueing for the new ward, would they now end up in both queues? For me this would be undesirable behaviour (I'd prefer them to go down handle_unfinished).

Enchufa2 commented 5 years ago

Sorry - getting stuck in my thinking! So long as the behaviour also affects queuing when an arrival is pre-empted or ejected due to capacity drops then I can happily leave all my ward queues at Inf.

It does when queue_size_strict=TRUE, as I believe it's your case.

I did spot one potential race condition with the current fix though (you may say it's not possible, but I don't know the multitasking details), however it's probably of limited importance and may even be desirable behaviour. It's presumably possible for another trajectory to reduce the capacity after an arrival has successfully seize()d the resource but before it has reduced the priority to prevent queuing. The arrival will presumably then re-enter the queue. However, this seems appropriate as it presumably will just look like it queued a bit longer. For my own use case I just need to make sure I reduce the priority before releasing the old ward.

So one arrival seizes a resource and decrements its priority to prevent queueing after preemption, and another one decrements the capacity, all at the same time, right? So technically the first arrival didn't seize the resource, because it's in and out instantaneously.

Anyway, there's an internal execution priority (for activities, different from arrivals' one) that accounts for the most important race conditions (I believe, except for possible bugs). But I don't think there's a perfect priority system, because some complicated (and thus rare) race conditions are ambiguous: in different use cases would require a different execution order (I cannot think of an example right now, but trust me: at some point I hit one and stopped trying to perfect it). And most race conditions are resolved simply by not using oversimplified arrival and service times (i.e., integers or fractionals).

The other thing I need to look into is what happens if a patient is queuing for a new ward and is pre-empted from the old one. As the priority will have been raised in order to allow queueing for the new ward, would they now end up in both queues? For me this would be undesirable behaviour (I'd prefer them to go down handle_unfinished).

Yes, they end up in both queues! (CuriousIy enough, I never considered this case. It raises all kinds of problems that deserve their separate issues...). But if the old ward had a maximum queueing priority, then the patient can be rejected once preempted (rejected... from both queues, but this won't happen now = one bug) and then go down to handle_unfinished.

It may also happen that they end up in both queues and then the new ward is available (!). They shouldn't be able to seize the new ward and continue the trajectory while they are enqueued in a previous resource! (and currently they would = another bug).

r-simmer / simmer

Preventing patients being "lost" / race condition #202