openlvc / portico

Portico is an open source, cross-platform, fully supported HLA RTI implementation. Designed with modularity and flexibility in mind, Portico is a production-grade RTI for the Simulation and Training Community, so come say hi!
http://www.porticoproject.org
152 stars 81 forks source link

Rapid Exchange of attribute ownership causes federates to lock up #166

Closed kataner83 closed 8 years ago

kataner83 commented 8 years ago

I habe a problem when testing the ownership management 1516e with two federates updating the same Attribute so that in each Step one federates asks for the ownership. IT Works for a short while then none of the federates gets the Ownership again.

timpokorny commented 8 years ago

G'day Onkelv,

A few questions:

Cheers, Tim

kataner83 commented 8 years ago

Hi Tim, I have a simple testcase with two federates updating the same stringatttribute. There is no exception thrown. Only an Output of the actual variable value Shows that after a few steps it does Not Change anymore. Then i debugged it and saw that none of the federates Has the ownership.

timpokorny commented 8 years ago

G'day mate - are you able to provide this?

Just to clarify, are you saying that the two federates successfully pass ownership back and forth for a few steps, but then at some point they both cease to have ownership (and thus one can no longer hand it off to the other)?

kataner83 commented 8 years ago

Exactly this is what i meant. By providing you mean the logfile or the testcase. The logfile should ne no Problem as soon as i have Access to my pc again. Is There a General procedure to follow the ownership aquisition or an example for the Java 1516e ownership management?

timpokorny commented 8 years ago

G'day mate - if you can provide the test case so I can reproduce that would be very much appreciated. I think we're right on what is going wrong now, but trying to figure out why will be key to getting it fixed. If I can repro I can watch it as it gets up to that point where ownership state is lost and figure out what is happening.

Unfortunately I don't have an example fed for 1516e ownership management. I shall have to port some of the old v1.3 test suite over to get proper coverage up and running. I sense some differences have crept in. Cheers, Tim

kataner83 commented 8 years ago

Hi, i have to say sorry but i can not provide the testcase in a reasonable time. It is embedded in a bigger environment which is not easily detachable and not open source. I hope you can help nevertheless. Cheers, Volker

icemagno commented 8 years ago

Relax. I'm planning to make a tutorial with examples at my blog soon (ASAP). As soon I cover the basics, I'll get into this. I aways use functional code, so it will be tested.

http://sim.cmabreu.com.br/

timpokorny commented 8 years ago

@onkelv - I shall attempt to write a test case to reproduce this

@icemagno - Awesome! Your blog and guides look really cool

icemagno commented 8 years ago

I'm planning to start it in next weekend. Two federates already done:

https://github.com/icemagno/ownership

icemagno commented 8 years ago

Preliminary results.

After performing some tests I came to the conclusion that the problem is the speed of ownership request.

The tests were as follows : ( source code: https://github.com/icemagno/ownership )

1 Federate Aircraft ( test driver ) ( Log: https://github.com/icemagno/ownership/blob/master/aircraft-log.txt ) 1 Federate Tank ( Log: https://github.com/icemagno/ownership/blob/master/tank-log.txt )

The Federate Tank starts producing one (or more) tank objects.

====== TANK FEDERATE ======
Type:

 n + ENTER : New tank
 q + ENTER : Quit

n
> Tank CD7CE created.

The Federate Aircraft receives new Tank objects and waits for the user to press the "a" key to acquire the attribute "TempAttribute" of the Tank ( use the first Tank of list ) .

====== AIRCRAFT FEDERATE ======
Type:
 n + ENTER : New aircraft
 a + ENTER : Start ping-pong attribute test
 q + ENTER : Quit
> New Tank discovered
> Tank 2097153 sent this temp value: This is a Temp value

a
> Starting ping-pong attribute test...
> Publishing Tank temp atribute
> Done.
> ================== ITERATION 0==================
> Will try to update temp attribute for all Tanks
> Try to update attribute from object 2097153
> It is not mine.
> Requested TempAttribute from Tank CD7CE (2097153)
> Tank 2097153 sent this temp value: 4713C

The Federate Tank receives the request, test the possession of the attribute trying to update it and then allows the passage of the attribute. After that, test it again, attempting to update .

> Someone requested ownership of my Tank 2097153
> Testing ownership of all Tanks...
> Try to update attribute from object 2097153
> Updated.
> Giving ownership...
> Done! It is not my problem now.
> Testing ownership again...
> Try to update attribute from object 2097153
> It is not mine.

The Federate Aircraft receives notification that the attribute was passed and tests possession trying to update it .

> I now own the temp attibute from Tank 2097153. Try to update it.
> Try to update attribute from object 2097153
> Updated.

Just when the Federate tank passes ownership of the attribute, he changes his mind and asks again possession .

At the very moment that the Federate aircraft passes ownership of the attribute, he changes his mind and asks again possession .

The problem is here : https://github.com/icemagno/ownership/blob/master/aircraft/src/br/com/cmabreu/Main.java#L111 and here: https://github.com/icemagno/ownership/blob/master/tank/src/br/com/cmabreu/Main.java#L168

    public void attributeOwnershipDivestitureIfWanted(
            ObjectInstanceHandle theObject, AttributeHandleSet candidateAttributes, byte[] userSuppliedTag) {
        iteration++;
        log("> ================== FALCON ONE ZERO ONE==============");
        log("The Tank Federate wants its attribute back. Tank " + theObject );
        log("Testing ownership of all Tanks...");
        tankClass.updateTempValue();
        log("Giving ownership...");
        try {
            rtiamb.attributeOwnershipDivestitureIfWanted(theObject, candidateAttributes);
            log("Done! It is not my problem now.");
            log("Testing ownership again...");
            tankClass.updateTempValue();

            try {
                Thread.sleep(800);
            } catch ( Exception e ) {
                //
            }

            if ( iteration < NUM_ITERATIONS ) {
                log("================== ITERATION " + iteration  + "==================");
                log("But I WANT it back!!");
                TankObject to = tankClass.getTank( theObject );
                if ( to != null ) tankClass.acquireAttribute( to );
            }

        } catch ( Exception e ) {
            log("Cannot accept attribute release: " + e.getMessage() );
        }
    }

As you can see, there is a Sleep(800) between the attribute divestiture and the new acquisition request.

If this change of mind is too fast, I can confirm the issue, but with a value above +-800 it is safe.

icemagno commented 8 years ago

By the way, I was able to change ownership even with <ownership>NoTransfer</ownership> property (not DivestAcquire). Why?

            <attribute>
                <name>TempAttribute</name>
                <dataType>HLAunicodeString</dataType>
                <updateType>Conditional</updateType>
                <updateCondition>On Change</updateCondition>
                <ownership>NoTransfer</ownership>
                <sharing>PublishSubscribe</sharing>
                <dimensions/>
                <transportation>HLAreliable</transportation>
                <order>Receive</order>
                <semantics>This attribute will be requested by the Aircraft</semantics>
            </attribute>  
timpokorny commented 8 years ago

Excellent work! I'll take a look at the logging that is in place for ownership exchange, but as a first pass can I get you to turn the logging up to TRACE and run until error again?

Ref the FOM. Correct - Portico will ignore that guidance in the FOM. Federates are free to exchange at their will. There is little runtime use to the doco in the FOM beyond doco or instruction to code generators. Even observing that data in the FOM, a federate will always have to have been coded to behave the right way, making it ultimately redundant (for RTI at runtime).

On 18 Feb 2016, at 11:05 PM, Carlos Magno Oliveira de Abreu notifications@github.com<mailto:notifications@github.com> wrote:

Preliminary results.

After performing some tests I came to the conclusion that the problem is the speed of ownership request.

The tests were as follows : ( source code: https://github.com/icemagno/ownership )

1 Federate Aircraft ( test driver ) ( Log: https://github.com/icemagno/ownership/blob/master/aircraft-log.txt ) 1 Federate Tank ( Log: https://github.com/icemagno/ownership/blob/master/tank-log.txt )

The Federate Tank starts producing one (or more) tank objects.

====== TANK FEDERATE ====== Type:

n + ENTER : New tank q + ENTER : Quit

n

Tank CD7CE created.

The Federate Aircraft receives new Tank objects and waits for the user to press the "a" key to acquire the attribute "TempAttribute" of the Tank ( use the first Tank of list ) .

====== AIRCRAFT FEDERATE ====== Type: n + ENTER : New aircraft a + ENTER : Start ping-pong attribute test q + ENTER : Quit

New Tank discovered Tank 2097153 sent this temp value: This is a Temp value

a

Starting ping-pong attribute test... Publishing Tank temp atribute Done. ================== ITERATION 0================== Will try to update temp attribute for all Tanks Try to update attribute from object 2097153 It is not mine. Requested TempAttribute from Tank CD7CE (2097153) Tank 2097153 sent this temp value: 4713C

The Federate Tank receives the request, test the possession of the attribute trying to update it and then allows the passage of the attribute. After that, test it again, attempting to update .

Someone requested ownership of my Tank 2097153 Testing ownership of all Tanks... Try to update attribute from object 2097153 Updated. Giving ownership... Done! It is not my problem now. Testing ownership again... Try to update attribute from object 2097153 It is not mine.

The Federate Aircraft receives notification that the attribute was passed and tests possession trying to update it .

I now own the temp attibute from Tank 2097153. Try to update it. Try to update attribute from object 2097153 Updated.

Just when the Federate tank passes ownership of the attribute, he changes his mind and asks again possession .

At the very moment that the Federate aircraft passes ownership of the attribute, he changes his mind and asks again possession .

The problem is here : https://github.com/icemagno/ownership/blob/master/aircraft/src/br/com/cmabreu/Main.java#L111 and here: https://github.com/icemagno/ownership/blob/master/tank/src/br/com/cmabreu/Main.java#L168

public void attributeOwnershipDivestitureIfWanted(
        ObjectInstanceHandle theObject, AttributeHandleSet candidateAttributes, byte[] userSuppliedTag) {
    iteration++;
    log("The Tank Federate wants its attribute back. Tank " + theObject );
    log("Testing ownership of all Tanks...");
    tankClass.updateTempValue();
    log("Giving ownership...");
    try {
        rtiamb.attributeOwnershipDivestitureIfWanted(theObject, candidateAttributes);
        log("Done! It is not my problem now.");
        log("Testing ownership again...");
        tankClass.updateTempValue();

        try {
            Thread.sleep(800);
        } catch ( Exception e ) {
            //
        }

        if ( iteration < NUM_ITERATIONS ) {
            log("================== ITERATION " + iteration  + "==================");
            log("But I WANT it back!!");
            TankObject to = tankClass.getTank( theObject );
            if ( to != null ) tankClass.acquireAttribute( to );
        }

    } catch ( Exception e ) {
        log("Cannot accept attribute release: " + e.getMessage() );
    }
}

As you can see, there is a Sleep(800) between the attribute divestiture and the new acquisition request.

If this change of mind is too fast, I can confirm the issue, but with a value above +-800 it is safe.

— Reply to this email directly or view it on GitHubhttps://github.com/openlvc/portico/issues/166#issuecomment-185759281.

This message and any attachments are confidential and commercial-in-confidence, and are intended solely for the use of the individual or entity to whom the message is addressed. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, be advised that you have received this message in error and that any use, dissemination, forwarding, printing or copying of this message and any file attachments is strictly prohibited. Any views or opinions presented are solely those of the author and do not necessarily represent those of Calytrix Technologies. If you have received this message in error, please immediately notify us by reply message to the sender.

icemagno commented 8 years ago

Here. aircraft-800 and tank-800 are 800ms sleep. All goes fine. aircraft-nosleep and tank-nosleep are no sleep.

At first look I've found this error in aircraft Federate:

DEBUG [ImmediateCallbackDispatcher] portico.lrc: ATTEMPT Release Attributes [743] of object [2097153]
DEBUG [ImmediateCallbackDispatcher] portico.lrc: org.portico.lrc.compat.JFederateWasNotAskedToReleaseAttribute: can't release attribute [743] of object [2097153]: release was not requested

ownership-logs.zip

icemagno commented 8 years ago

Hi Tim. The logs are attached to message above.

timpokorny commented 8 years ago

G'day Carlos, much obliged. I am looking through this now!

timpokorny commented 8 years ago

@icemagno - I can now reproduce the problem you have surfaced in your test code (thank-you!). I am seeing some strange things with regard to message ordering when it fails and am looking into whether it may be the root cause of this problem (the LRC may have been told it has acquired ownership, but wasn't in a state to record it properly at the time).

timpokorny commented 8 years ago

OK - So I think I have figured out what is causing this. I just have to work out a fix now.

Essentially the problem is in the provisions for what happens when a federate requests ownership of attributes that it believes to be unowned.

When a federate says it wants ownership of some attributes, it splits this request into "attributes that are unowned" and "attributes owned by someone else". For the second one it sends a release request and records that it is interested. For the first set they should just be able to acquire them right away.

A typical exchange works like this:

  1. (A) Request ownership (and record that request locally)
  2. (B) Receive this request
  3. (B) Release attributes
  4. (A) Hear's about release and checks if it has any outstanding requests that be fulfilled, which it does
    1. (A) Sends a notification out that it now owns the attributes

Basically, the problem is that the release is just a release to anyone. (A) then has to separately acquire them and send the notification out.

The problem is in step 1 and step 5. Basically, (B) starts requesting the attributes back as soon as it has released them, but at that stage it hasn't yet received notification that they were acquired by someone else, so it thinks they are unowned. This is causing it not to process the actual release (when it does come) because it never records a request to check against later.

It's all too complicated (and not helped by the HLA API trying to do too much), so I'll have to figure out a way to tighten this process up a bit so that it accounts for such situations.

icemagno commented 8 years ago

I think you may consider using a CRC... it may help to solve #162 too.

kataner83 commented 8 years ago

A small question are there any progresses on this problem ?

timpokorny commented 8 years ago

A small response, yes ( sorry :( )

I am slowly zeroing in on the problem. Thanks to @icemagno's test I can now reproduce this and have a unit test that also does this.

When I run that test I see the following in the log:

DEBUG [main] portico.lrc: ATTEMPT Acquire ownership of attributes [509, 510, 511] in object [2]
DEBUG [main] portico.lrc: Sent direct ownership acquisition request for attributes [509, 510, 511] of object [2] from federate [defaultFederate]
DEBUG [main] portico.lrc: (defaultFederate requesting ... [509, 510, 511]
DEBUG [main] portico.lrc: (defaultFederate) requesting=[]

The last two lines are the most interesting.

Basically, this means that the federate is forgetting that it has requested the attributes. Thus, when the release comes through it never picks them up (because it doesn't think it wants them).

Why this works one time and doesn't work the next I am not sure. Looking into that now.

icemagno commented 8 years ago

I'm happy to help. But please take a look in #162 too when you have some time. I have a frozen project waiting for this. But don't hurry too much!! Thanks.

timpokorny commented 8 years ago

Fix identified and all tests still passing. Merging into master.

timpokorny commented 8 years ago

Done. Now merged into master. A clean build from there should show this fix. If someone can confirm, or report back any issues that would be awesome.

Assuming no other issues with the nightly build, it should be available here tomorrow morning (Australian Western time that is): https://sourceforge.net/projects/portico/files/nightly/