Closed pramit11 closed 9 years ago
@pramit11 thanks for bringing this to our attention!
@timpokorny if this is just a matter of the resignMessage object not being sent via jgroups I'm more than happy to take a look at it.
@michaelrfraser I'm not 100% sure. The fake resign should be generated and processed locally only (not sent out as it would be in normal circumstances).
I have a funny feeling this is due to the way the suspect
JGroups callback is handled. Off the top of my head I think this method is first called for a lost connection, and then later, when confirmed, a new View
turns up that does not contain the lost connection any more (as a sort of confirmation). We may doing or not doing something in the suspect process. So it could be in code, it could be in the JGroups stack configuration (embedded in the jar - so it'll be in resources). A bit of quick JGroups reading required if you're up for it.
I've submitted a PR for this issue here https://github.com/openlvc/portico/pull/128
PR merged into master
Summary
On running a federation execution with time regulating enabled on federates (using the example federate), if one of the federates crashes, execution doesn't grant time to other federates. In the ideal case, when a federate crash is detected, federation should resign the crashed federate, and continue the execution.
Following log is generated-
On the code level, in FederationListener.java, resign message is being created for crashed federate but it is not being dispatched.
Environment and Logs