usnistgov / ucef-meta

WebGME Federate and Experiment Designer
MIT License
3 stars 3 forks source link

Feature 5 #11

Closed MartyBurns closed 6 years ago

MartyBurns commented 6 years ago

This is a companion pull request to the changes in ucef-core for SynchronizedFederate. Together they add the ability of federates to exit gracefully -- that is with orderly takedown and exit through the main function.

tpr1 commented 6 years ago

This feature breaks due to the latest changes in develop for reasons I have been unable to figure out.

I am still looking into what causes it to break, but it will not be a fast merge.

tpr1 commented 6 years ago

I think there is a problem with these lines:

https://github.com/usnistgov/ucef-meta/blob/6de6275d1a379816e531c880ed8b9d9f63694f42/src/plugins/C2Federates/Templates/java/federateimpl.java.ejs#L172-L174

Java federates send 1 additional advance time request after receipt of SimEnd. This is probably due to one of these lines. It's apparently not as simple as putting an if statement around these lines, as that causes the code to freeze before t = 1.

Here is an example of the additional time request:

INFO  [Thread-1] portico.lrc: PENDING Requested time advance for [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] to [6.0], waiting for grant...
DEBUG [Thread-1] portico.lrc: Federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] requests ADVANCE to time [6.0]
DEBUG [Thread-1] portico.lrc: Federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] requests ADVANCE to time [6.0]
DEBUG [Thread-1] portico.lrc: @REMOTE Received INTERACTION [720] with parameters [697, 698, 699, 700] @5.1
DEBUG [Thread-1] portico.lrc: Federate [FederationManager-79ae1e9c-9a4f-41ab-9c54-c7df2fc65a61] requests ADVANCE to time [6.0]
DEBUG [Thread-1] portico.lrc: ADVANCE (QUEUED) time-advance grant for federate [FederationManager-79ae1e9c-9a4f-41ab-9c54-c7df2fc65a61] to time [6.0]
DEBUG [Thread-1] portico.lrc: ADVANCE (QUEUED) time-advance grant for federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] to time [6.0]
DEBUG [Thread-1] portico.lrc: ADVANCE (QUEUED) time-advance grant for federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] to time [6.0]
DEBUG [Thread-1] portico.lrc: @REMOTE Federate [FederationManager-79ae1e9c-9a4f-41ab-9c54-c7df2fc65a61] ACHIEVED synchronization point [readyToResign]
DEBUG [Thread-1] portico.lrc: (veto) still waiting for federation to synchronize on point [readyToResign]
DEBUG [Thread-1] portico.lrc: @REMOTE Received INTERACTION [723] with parameters [697, 698, 699, 700] @5.2
DEBUG [Thread-1] portico.lrc: ADVANCE (GRANTED) for federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] to time [6.0]
DEBUG [Thread-1] portico.lrc: ADVANCE (GRANTED) for federate [FederationManager-79ae1e9c-9a4f-41ab-9c54-c7df2fc65a61] to time [6.0]
DEBUG [Thread-1] portico.lrc: ADVANCE (GRANTED) for federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] to time [6.0]
20:49:22.036 [FederateProject.Receiver.main()] INFO  org.cpswt.hla.SynchronizedFederate - Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066: SimEnd interaction received, exiting...
20:49:22.036 [FederateProject.Receiver.main()] DEBUG FederateProject.Receiver - in checkReceivedSubscriptions
20:49:22.036 [FederateProject.Receiver.main()] WARN  FederateProject.Receiver - unhandled interaction: InteractionRoot.C2WInteractionRoot.SimulationControl.SimEnd
20:49:22.037 [FederateProject.Receiver.main()] INFO  org.cpswt.hla.SynchronizedFederate - Exiting gracefully ....
DEBUG [FederateProject.Receiver.main()] portico.lrc: ATTEMPT Send interaction [706] with parameters [707, 708, 709, 697, 698, 699, 700] (RO)
INFO  [FederateProject.Receiver.main()] portico.lrc: SUCCESS Sent interaction [706] with parameters [707, 708, 709, 697, 698, 699, 700] (RO)
DEBUG [Thread-1] portico.lrc: REQUEST Time advance request for [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] to [7.0]
INFO  [Thread-1] portico.lrc: PENDING Requested time advance for [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] to [7.0], waiting for grant...
DEBUG [Thread-1] portico.lrc: Federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] requests ADVANCE to time [7.0]
DEBUG [Thread-1] portico.lrc: Federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] requests ADVANCE to time [7.0]
DEBUG [Thread-1] portico.lrc: @REMOTE Received INTERACTION [706] with parameters [707, 708, 709, 697, 698, 699, 700] (RO)
DEBUG [Thread-1] portico.lrc: DISCARD interaction of class [706]: no subscription
DEBUG [Thread-1] portico.lrc: (veto) Interaction ignored: not subscribed
DEBUG [FederateProject.Receiver.main()] portico.lrc: ATTEMPT Resign federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] from federation [FederateProject]: action=DELETE_OBJECTS
DEBUG [FederateProject.Receiver.main()] portico.lrc.jgroups: REQUEST resignFederation: federate=Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066, federation=FederateProject
DEBUG [Regular] portico.lrc.jgroups: (GMS) resignFederation(vagrant-8738)
INFO  [Regular] portico.lrc.jgroups: Federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] has resigned from [FederateProject]
INFO  [FederateProject.Receiver.main()] portico.lrc.jgroups: SUCCESS Federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] resigned from [FederateProject]
DEBUG [FederateProject.Receiver.main()] portico.lrc: Messaging framework configuration complete
INFO  [FederateProject.Receiver.main()] portico.lrc: LRC initialized (HLA version: HLA13)
INFO  [FederateProject.Receiver.main()] portico.lrc: Opening LRC Connection
INFO  [FederateProject.Receiver.main()] portico.lrc: SUCCESS Resigned federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] from federation [FederateProject]: action=DELETE_OBJECTS
20:49:27.129 [FederateProject.Receiver.main()] INFO  FederateProject.Receiver - Done.

The federate (Receiver) receives SimEnd at t=6 and tries to exit using the resign interaction [706]. It then goes on to request t=7 despite having processed SimEnd.

This can cause errors:

INFO  [Thread-1] portico.lrc: PENDING Requested time advance for [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] to [6.0], waiting for grant...
DEBUG [Thread-1] portico.lrc: Federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] requests ADVANCE to time [6.0]
DEBUG [Thread-1] portico.lrc: Federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] requests ADVANCE to time [6.0]
DEBUG [Thread-1] portico.lrc: @REMOTE Received INTERACTION [720] with parameters [697, 698, 699, 700] @5.1
DEBUG [Thread-1] portico.lrc: Federate [FederationManager-79ae1e9c-9a4f-41ab-9c54-c7df2fc65a61] requests ADVANCE to time [6.0]
DEBUG [Thread-1] portico.lrc: ADVANCE (QUEUED) time-advance grant for federate [FederationManager-79ae1e9c-9a4f-41ab-9c54-c7df2fc65a61] to time [6.0]
DEBUG [Thread-1] portico.lrc: ADVANCE (QUEUED) time-advance grant for federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] to time [6.0]
DEBUG [Thread-1] portico.lrc: ADVANCE (QUEUED) time-advance grant for federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] to time [6.0]
DEBUG [Thread-1] portico.lrc: @REMOTE Federate [FederationManager-79ae1e9c-9a4f-41ab-9c54-c7df2fc65a61] ACHIEVED synchronization point [readyToResign]
DEBUG [Thread-1] portico.lrc: (veto) still waiting for federation to synchronize on point [readyToResign]
DEBUG [Thread-1] portico.lrc: ADVANCE (GRANTED) for federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] to time [6.0]
DEBUG [Thread-1] portico.lrc: ADVANCE (GRANTED) for federate [FederationManager-79ae1e9c-9a4f-41ab-9c54-c7df2fc65a61] to time [6.0]
DEBUG [Thread-1] portico.lrc: ADVANCE (GRANTED) for federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] to time [6.0]
20:49:22.035 [FederateProject.Sender.main()] INFO  org.cpswt.hla.SynchronizedFederate - Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36: SimEnd interaction received, exiting...
DEBUG [FederateProject.Sender.main()] portico.lrc: ATTEMPT Send interaction [723] with parameters [697, 698, 699, 700] @6.2
INFO  [FederateProject.Sender.main()] portico.lrc: SUCCESS Sent interaction [723] with parameters [697, 698, 699, 700] @6.2
DEBUG [Thread-1] portico.lrc: REQUEST Time advance request for [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] to [7.0]
INFO  [Thread-1] portico.lrc: PENDING Requested time advance for [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] to [7.0], waiting for grant...
DEBUG [Thread-1] portico.lrc: Federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] requests ADVANCE to time [7.0]
20:49:22.042 [FederateProject.Sender.main()] INFO  org.cpswt.hla.SynchronizedFederate - Exiting gracefully ....
DEBUG [FederateProject.Sender.main()] portico.lrc: ATTEMPT Send interaction [706] with parameters [707, 708, 709, 697, 698, 699, 700] (RO)
INFO  [FederateProject.Sender.main()] portico.lrc: SUCCESS Sent interaction [706] with parameters [707, 708, 709, 697, 698, 699, 700] (RO)
DEBUG [Thread-1] portico.lrc: @REMOTE Received INTERACTION [706] with parameters [707, 708, 709, 697, 698, 699, 700] (RO)
DEBUG [Thread-1] portico.lrc: DISCARD interaction of class [706]: no subscription
DEBUG [Thread-1] portico.lrc: (veto) Interaction ignored: not subscribed
DEBUG [Thread-1] portico.lrc: Federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] requests ADVANCE to time [7.0]
ERROR [FederateProject.Sender.main()] portico.lrc: Currently ticking
hla.rti.ConcurrentAccessAttempted: Currently ticking serial:0
    at org.portico.impl.hla13.Impl13Helper.checkAccess(Impl13Helper.java:171)
    at org.portico.impl.hla13.Rti13Ambassador.processMessage(Rti13Ambassador.java:5806)
    at org.portico.impl.hla13.Rti13Ambassador.resignFederationExecution(Rti13Ambassador.java:412)
    at org.cpswt.hla.SynchronizedFederate.resignFederationExecution(SynchronizedFederate.java:546)
    at org.cpswt.hla.SynchronizedFederate.exitGracefully(SynchronizedFederate.java:1331)
    at FederateProject.Sender.execute(Sender.java:84)
    at FederateProject.Sender.main(Sender.java:96)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:294)
    at java.lang.Thread.run(Thread.java:748)
DEBUG [Regular] portico.lrc.jgroups: (GMS) resignFederation(vagrant-8738)
DEBUG [Regular] portico.lrc: @REMOTE Federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] resigned from federation [FederateProject]: action=DELETE_OBJECTS
DEBUG [Regular] portico.lrc: Queued dummy TimeAdvanceReqeust for federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36]
INFO  [Regular] portico.lrc.jgroups: Federate [Receiver-51d2f67a-98ce-4a6a-b823-3a4783165066] has resigned from [FederateProject]
DEBUG [Thread-1] portico.lrc: @REMOTE Delete Object: object=4194302, federate=RTI (RO)
DEBUG [Thread-1] portico.lrc: Removed undiscovered object from repository [4194302]
20:49:27.109 [FederateProject.Sender.main()] WARN  org.cpswt.hla.SynchronizedFederate - WARNING:  problem encountered while resigning federation execution: Unknown exception received from RTI (class hla.rti.ConcurrentAccessAttempted) for resignFederationExecution(): Currently ticking | retrying #1
DEBUG [FederateProject.Sender.main()] portico.lrc: ATTEMPT Resign federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] from federation [FederateProject]: action=DELETE_OBJECTS
DEBUG [FederateProject.Sender.main()] portico.lrc.jgroups: REQUEST resignFederation: federate=Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36, federation=FederateProject
DEBUG [Regular] portico.lrc.jgroups: (GMS) resignFederation(vagrant-64325)
INFO  [Regular] portico.lrc.jgroups: Federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] has resigned from [FederateProject]
DEBUG [Regular] portico.lrc.jgroups: (GMS) resignFederation(vagrant-1297)
DEBUG [Regular] portico.lrc: @REMOTE Federate [FederationManager-79ae1e9c-9a4f-41ab-9c54-c7df2fc65a61] resigned from federation [FederateProject]: action=DELETE_OBJECTS
DEBUG [Regular] portico.lrc: Queued dummy TimeAdvanceReqeust for federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36]
INFO  [Regular] portico.lrc.jgroups: Federate [FederationManager-79ae1e9c-9a4f-41ab-9c54-c7df2fc65a61] has resigned from [FederateProject]
DEBUG [Thread-1] portico.lrc: @REMOTE Delete Object: object=1, federate=RTI (RO)
DEBUG [Thread-1] portico.lrc: Removed undiscovered object from repository [1]
DEBUG [Thread-1] portico.lrc: ADVANCE (QUEUED) time-advance grant for federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] to time [7.0]
DEBUG [Thread-1] portico.lrc: ADVANCE (GRANTED) for federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] to time [7.0]
ERROR [FederateProject.Sender.main()] portico.lrc: TimeoutException
org.jgroups.TimeoutException: TimeoutException
    at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:145)
    at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:40)
    at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)
    at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:287)
    at org.jgroups.protocols.RSVP.down(RSVP.java:118)
    at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:328)
    at org.jgroups.protocols.pbcast.GMS.down(GMS.java:965)
    at org.jgroups.protocols.FlowControl.down(FlowControl.java:351)
    at org.jgroups.protocols.FlowControl.down(FlowControl.java:351)
    at org.jgroups.protocols.FRAG2.down(FRAG2.java:147)
    at org.jgroups.protocols.pbcast.FLUSH.down(FLUSH.java:312)
    at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
    at org.jgroups.JChannel.down(JChannel.java:729)
    at org.jgroups.JChannel.send(JChannel.java:445)
    at org.portico.bindings.jgroups.channel.Channel.sendSyncControlMessage(Channel.java:280)
    at org.portico.bindings.jgroups.channel.Channel.sendResignFederation(Channel.java:244)
    at org.portico.bindings.jgroups.Federation.sendResignFederation(Federation.java:313)
    at org.portico.bindings.jgroups.JGroupsConnection.resignFederation(JGroupsConnection.java:307)
    at org.portico.lrc.services.federation.handlers.outgoing.ResignFederationHandler.process(ResignFederationHandler.java:79)
    at org.portico.utils.messaging.MessageSink.process(MessageSink.java:187)
    at org.portico.impl.hla13.Impl13Helper.processMessage(Impl13Helper.java:123)
    at org.portico.impl.hla13.Rti13Ambassador.processMessage(Rti13Ambassador.java:5822)
    at org.portico.impl.hla13.Rti13Ambassador.resignFederationExecution(Rti13Ambassador.java:412)
    at org.cpswt.hla.SynchronizedFederate.resignFederationExecution(SynchronizedFederate.java:546)
    at org.cpswt.hla.SynchronizedFederate.exitGracefully(SynchronizedFederate.java:1331)
    at FederateProject.Sender.execute(Sender.java:84)
    at FederateProject.Sender.main(Sender.java:96)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:294)
    at java.lang.Thread.run(Thread.java:748)
20:49:37.633 [FederateProject.Sender.main()] WARN  org.cpswt.hla.SynchronizedFederate - WARNING:  problem encountered while resigning federation execution: Unknown exception received from RTI (class org.jgroups.TimeoutException) for resignFederationExecution(): TimeoutException | retrying #2
DEBUG [FederateProject.Sender.main()] portico.lrc: ATTEMPT Resign federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] from federation [FederateProject]: action=DELETE_OBJECTS
ERROR [FederateProject.Sender.main()] portico.lrc: Federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] not joined to [FederateProject]
org.portico.lrc.compat.JFederateNotExecutionMember: Federate [Sender-67ab2bd6-24a4-43e8-83dc-e5a3ae33cb36] not joined to [FederateProject]
    at org.portico.bindings.jgroups.JGroupsConnection.resignFederation(JGroupsConnection.java:303)
    at org.portico.lrc.services.federation.handlers.outgoing.ResignFederationHandler.process(ResignFederationHandler.java:79)
    at org.portico.utils.messaging.MessageSink.process(MessageSink.java:187)
    at org.portico.impl.hla13.Impl13Helper.processMessage(Impl13Helper.java:123)
    at org.portico.impl.hla13.Rti13Ambassador.processMessage(Rti13Ambassador.java:5822)
    at org.portico.impl.hla13.Rti13Ambassador.resignFederationExecution(Rti13Ambassador.java:412)
    at org.cpswt.hla.SynchronizedFederate.resignFederationExecution(SynchronizedFederate.java:546)
    at org.cpswt.hla.SynchronizedFederate.exitGracefully(SynchronizedFederate.java:1331)
    at FederateProject.Sender.execute(Sender.java:84)
    at FederateProject.Sender.main(Sender.java:96)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:294)
    at java.lang.Thread.run(Thread.java:748)
20:49:38.142 [FederateProject.Sender.main()] WARN  org.cpswt.hla.SynchronizedFederate - WARNING:  While resigning federation:  federate not execution member.
20:49:38.145 [FederateProject.Sender.main()] INFO  FederateProject.Sender - Done.

I assume in this case the attempt to advance time to t=7 and process the incoming message queue overlapped with the attempt to either send the resign interaction or perform the actual resign. It still exits, but does so with a rather impressive series of errors.

One possible reason for the incompatibility with gateway federates could be this pending time advance request that comes when a gateway is trying to synchronize on readyToResign. Regardless - these particular errors happen between just two simple Java federates that exchange a single interaction that contains no data.