loonwerks / jkind

JKind - An infinite-state model checker for safety properties in Lustre
http://loonwerks.com/tools/jkind.html
Other
52 stars 32 forks source link

Added synchronization to fix NPE bug in MessageHandler #72

Closed mww-aws closed 2 years ago

mww-aws commented 2 years ago

MWW, 8/27/2022 This PR fixes an occasional NPE bug that we have seen in jkind in the MessageHandler class. Within the class, the receiveMessage and stopReceivingMessages methods run on different threads, leading to occasional NPE exceptions due to race conditions when solver is completing analyses.

NPEs happen when incoming is checked for null, but then receiveMessage thread is interrupted and incoming is set to null by stopReceivingMessages prior to incoming.add(message).

Note that other protected functions do not need to be synchronized because they are called on the same thread as stopReceivingMessages.

The fix was tested by running several of the tests in the jkind testing directory looking for possible deadlocks and performance changes. We also ran models that occasionally lead to NPEs that are internal. There is a slight performance hit using synchronized for message passing (a few % of execution, depending on the problem), and no deadlocks occurred. Furthermore, no NPEs were observed.

mww-aws commented 2 years ago

It may be more performant to just add a try/catch block around the incoming.add call to trap NPEs, then check whether incoming is null, but this is also a little bit messier, maintenance-wise, and I'd rather not generate the NPE at all.

agacek commented 2 years ago

Nice find. Thanks for tracking this down and making the fix.