usnistgov / ucef-meta

WebGME Federate and Experiment Designer
MIT License
3 stars 3 forks source link

Allow federates to resign gracefully #5

Closed MartyBurns closed 6 years ago

MartyBurns commented 6 years ago

Version information

fa471820415e64d9deca0bc4981e754ba48c0852

Observed behavior

Currently, synchronized federate default behavior is to handle SimEnd message with this code SynchronizedFederate.java:

protected void handleIfSimEnd(int interactionClass, ReceivedInteraction theInteraction, LogicalTime theTime) {
    if (SimEnd.match(interactionClass)) {
        logger.info("{}: SimEnd interaction received, exiting...", getFederateId());
        try {
            // getLRC().tick();
            getLRC().resignFederationExecution(ResignAction.DELETE_OBJECTS);
        } catch (Exception e) {
            logger.error("Error during resigning federate: {}", getFederateId());
            logger.error(e.getMessage());
        }

        // Wait for 10 seconds for Federation Manager to recognize that the federate has resigned.
        try {
            Thread.sleep(CpswtDefaults.SimEndWaitingTimeMillis);
        } catch (Exception e) {
            logger.error(e.getMessage());
        }

        // TODO: CONSIDER SETTING UP A SHUTDOWN HOOK
        // this one will terminate the JVM not only the current process
        Runtime.getRuntime().exit(0);

        // Exit
        System.exit(0);
    }
}

Expected behavior

Should allow federates derived from synchronizedfederate to gracefully exit allowing for cleanup etc..

Steps to reproduce issue

Run any federation with java federate

MartyBurns commented 6 years ago

When the SimEnd interaction is received by the federate code, the flag to exit should be set. This will cause the federate code to break out of the execute while loop. Also

This can be resolved by:

1) Adding to the -base.java the following:

-- add override method

/**
 * Handles a simEnd interaction. Overrides the SynchronizedFederate default behavior which is to abort 
 *
 * @return void
 */
@Override
protected void handleIfSimEnd(int interactionClass, ReceivedInteraction theInteraction, LogicalTime theTime) {
    if (SimEnd.match(interactionClass)) {
        logger.info("{}: SimEnd interaction received, exiting...", getFederateId());

        // this one will set flag allowing foreground federate to gracefully shut down
        exitCondition = true;
    }
}

-- add exitGracefully method /**

In the federate implementation add in the execute method: // this is the exit condition of the following while loop // it is used to break the loop so that latejoiner federates can // notify the federation manager that they left the federation

    while (exitCondition == false) {
        currentTime += super.getStepSize();

        atr.requestSyncStart();
        enteredTimeGrantedState();

        ////////////////////////////////////////////////////////////////////////////////////////
        // TODO send interactions that must be sent every logical time step below.
        // Set the interaction's parameters.
        //
        //    ThePing vThePing = create_ThePing();
        //    vThePing.sendInteraction(getLRC(), currentTime);
        //
        ////////////////////////////////////////////////////////////////////////////////////////

        CheckReceivedSubscriptions("Main Loop");

        // !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
        // DO NOT MODIFY FILE BEYOND THIS LINE
        // !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
        AdvanceTimeRequest newATR = new AdvanceTimeRequest(currentTime);
        putAdvanceTimeRequest(newATR);
        atr.requestSyncEnd();
        atr = newATR;

        // example of when this federate determines it wants to resign
        if(currentTime == 10.0) 
            selfResign=true;

        // if we want to exit, set this flag in the body of this while loop
        if (selfResign){
            break;
        }

    }

    exitGracefully();

    // do your own cleanups
MartyBurns commented 6 years ago

Sometimes as federate exits get this:

13:56:01.686 [main] INFO  Ping.PingBase - Ping-2e7d8589-ba90-47de-8d74-6703de1bf362: SimEnd interaction received, exiting...
ERROR [main] portico.lrc: Currently ticking
hla.rti.ConcurrentAccessAttempted: Currently ticking serial:0
    at org.portico.impl.hla13.Impl13Helper.checkAccess(Impl13Helper.java:171)
    at org.portico.impl.hla13.Rti13Ambassador.processMessage(Rti13Ambassador.java:5806)
    at org.portico.impl.hla13.Rti13Ambassador.resignFederationExecution(Rti13Ambassador.java:412)
    at Ping.PingBase.exitGracefully(PingBase.java:137)
    at Ping.Ping.execute(Ping.java:109)
    at Ping.Ping.main(Ping.java:130)
13:56:01.704 [main] ERROR Ping.PingBase - Error during resigning federate: Ping-2e7d8589-ba90-47de-8d74-6703de1bf362
13:56:01.704 [main] ERROR Ping.PingBase - Unknown exception received from RTI (class hla.rti.ConcurrentAccessAttempted) for resignFederationExecution(): Currently ticking

I think this means that the code should wait till not ticking. Is this true?

tpr1 commented 6 years ago

You cannot make a service call to the RTI while it is processing another request. The currently ticking error happens when multiple simultaneous calls are made to the RTI.

Some background. A synchronized federate maintains two threads: the user execution thread, and a time advancement thread. This was not a great design choice; the time advancement thread makes constant calls to the RTI which occur at non-obvious times. I assume the resign call sometimes happens during one of these times.

It is probably possible to re-arrange some method calls to make sure the time advance thread is idle when an exit occurs. It would require some sifting through this file:

https://github.com/usnistgov/ucef-core/blob/develop/cpswt-core/federate-base/src/main/java/org/cpswt/hla/base/AdvanceTimeThread.java

tpr1 commented 6 years ago

If you want to use isTicking -

https://github.com/openlvc/portico/blob/7559c505bd4c5b62935a6677750458c6ec09f082/codebase/src/java/portico/org/portico/lrc/LRCState.java#L522

MartyBurns commented 6 years ago

I did some work with checkAccess() from RTIambassadorEX.helper(). While I was able to get this to run, it did not solve the problem. Still got ticking fault on resignFederationExecution() call.

MartyBurns commented 6 years ago

Problem seems to be resolved by simpler ExitGracefully:

    public void exitGracefully() throws hla.rti.ConcurrentAccessAttempted
    {

        // notify FederationManager about resign
        super.notifyFederationOfResign();

        // Wait for 10 seconds for Federation Manager to recognize that the federate has resigned.
        try {
            Thread.sleep(CpswtDefaults.SimEndWaitingTimeMillis);
        } catch (Exception e) {
            logger.error(e.getMessage());
        }
    }

In the execute() while loop, simply break when you need to exit. The the loop will exit and the ExitGracefully will be called ending the federate.

MartyBurns commented 6 years ago

This version successfully calls resignFederationExecution like the SynchronizedFederate does (before exiting ungracefully):

    public void exitGracefully() throws hla.rti.ConcurrentAccessAttempted
    {

        // notify FederationManager about resign
        super.notifyFederationOfResign();

        try {
            // getLRC().tick();
            getLRC().resignFederationExecution(ResignAction.DELETE_OBJECTS);
        } catch (Exception e) {
            logger.error("Error during resigning federate: {}", getFederateId());
            logger.error(e.getMessage());
        }

        // Wait for 10 seconds for Federation Manager to recognize that the federate has resigned.
        try {
            Thread.sleep(CpswtDefaults.SimEndWaitingTimeMillis);
        } catch (Exception e) {
            logger.error(e.getMessage());
        }
    }

I have added this to the template for the federate code generation.

tpr1 commented 6 years ago

These changes create dead code in core.

Both exitCondition and handleIfSimEnd should really be changes to the core repo.

MartyBurns commented 6 years ago

I suppose that would be good if we want all implementations of synchronizedfederate to have them. The existing code would only be dead for our java code-generated projects, no?

What about exitgracefully?

MartyBurns commented 6 years ago

Now pursuing implementation of this feature in meta and core with SynchronizedFederate.

tpr1 commented 6 years ago

fixed for java federates in https://github.com/usnistgov/ucef-meta/pull/11