sasha-s / go-deadlock

Online deadlock detection in go (golang)
Apache License 2.0
1.03k stars 76 forks source link

panic in OnPotentialDeadlock callback #29

Open atishaya11 opened 1 year ago

atishaya11 commented 1 year ago

Hi,

Raising a request to check if there are plans to support panic in OnPotentialDeadlock callback. Default behaviour is a os.Exit. If I override the callback function, panicking in callback is not supported as internally unlock for lockOrder mutex is not deferred which creates issues with further execution of the program.

Also checking if it was a conscious call to not support panic in callback.

My usecase is to achieve something as below so that all the go routine are not impacted apart from the one where the deadlock could have come up.

deadlock.Opts.OnPotentialDeadlock = onPotentialDeadlock
func onPotentialDeadlock () {
    panic("potential deadlock detected")
}

Please let me know if there are any follow up questions.

Thanks

atishaya11 commented 1 year ago

Hi @sasha-s, Can you please help here ?

sasha-s commented 1 year ago

How do you expect panic to work from within OnPotentialDeadlock?

It's being called from a separate goroutine when DeadlockTimeout is triggered, so in that case panic would terminate the program.

What do you want to do after panic? The mutexes (in your code) that were involved in a deadlock will stay locked, so next time you try to lock them, you will get into a new deadlock.

atishaya11 commented 1 year ago

Thanks for your response @sasha-s.

So we were using earlier version of the library in which I feel the OnPotentialDeadlock callback is called on the go routine which is trying to take lock. Please correct me if I am missing something.

The thought process is if we are able to panic the go routine and appropriately recover it, only a particular go routine will be impacted avoiding the impact of os.Exit(). Definitely, not a solution but just a fallback to reduce the blast radius.

Also my understanding was that OnPotentialDeadlock is called when the deadlock could have occurred because the lock function wouldn't have acquired the lock.(Considering version v0.2.0 of the library) in all three cases:

Please do let me know there is anything being missed.