Port federate code to use the immediate callback facilities rather than evoked

timpokorny commented 9 years ago

Story

{As a} developer of an open source RTI, {I want} my benchmarking suite to run as fast as humanly possible, {So that} I can test the limits of the RTI, not the benchmarking application

Context

The HLA defines a series of asynchronous services. In 1516e federates can control the way that they receive callbacks. Previously, you had to call tick() or evokeXxxCallbacks() to signal to the RTI when you were ready to process waiting messages. As of 1516e you can enable an "immediate" callback mode, which essentially delivers callbacks as soon as they are ready (done on a separate thread).

Part of the problem with the tick() style services is that you have very little control over how long you tie up the process for. The provided facilities in 1516e are:

evokeCallback( double minWaitTime ) - Process a single callback, waiting for the specified given time at a minimum if there are no messages to process
evokeMultipleCallbacks( double mintime, double maxtime ) - Process many callbacks, waiting for at least mintime but no longer than maxtime

The issue here is that when calling these methods I may block if there is no work to do, and I have to wait at least mintime for something to appear. This is time I could be using to do other work. In many circumstances this may not be important, but when you're working on a application that you want to push out information as fast as possible, the use of these services introduces an arbitrary delay. Playing with the loop-wait argument that the test federate takes lets you see this in action. The higher the value, the slower the throughput.

Why not just blat everything out and not tick until the end!? No - idiot - you'll fill up your queue with messages from other federates and run out of memory. The queues inside an LRC need tending, and the tick()/evoke() calls as the formal handing over of power to the LRC to do them. Also, it's not really indicative of "real world" use.

So, what would be better is to use the immediate callback facilities that are available. In Portico - and I suspect every other RTI - this basically means that a background thread is running constantly, tending to the incoming messages and invoking calls on the FederateAmbassador. You no longer need to call the tick/evoke methods, and nor do you need to worry about handing over time to queue-tend to ensure it doesn't grow to big. It's just done for you.

Ramifications? You can get FederateAmbassador calls at any time - including when you're right in the middle of doing other work, so be careful about any shared state.

It would be nice as part of this ticket to maintain the ability to use evoked callbacks if a particular command line argument was given. This would let us contract performance under each style.

Thus ends this completely unnecessary and lengthy commentary on a ticket that could have read "move from evoked to immediate callbacks"

Acceptance Criteria

Once complete:

[x] The test federate will use the IMMEDIATE callback mode, rather than EVOKED
[x] A user can tell the federate to use EVOKED mode instead via a command line argument
[x] There will be a clear message logged at the beginning of a run that says what mode is in use

Remaining Tasks:

[x] Fix problem where federate hangs on resign
[x] Remove the current "slow down" code in the Throughput Test (sleeps briefly every so often to slow things down slightly and prevent openlvc/portico#74 from happening)
[x] Update shell scripts for new command line arguments (and better doc for loop-wait)
[x] Fix crash in federate caused by trying to send when there are no threads available in the pool

timpokorny commented 9 years ago

Updated added and merged over to master as part of PR #6.

Updated the task list in the main issue with the remaining tasks.

Fix problem where federate hangs on resign
Remove the current "slow down" code in the Throughput Test
Update shell scripts for new command line arguments (and better doc for loop-wait)

timpokorny commented 9 years ago

Fixed problem with federate hanging on resign. One small code change to disconnect on the part of the test federate as well as a patched Portico file.

"Removed" the slowcode code for by taking the sleep away. It seems that just doing the modulo operation each tick is enough to ease the pressure ever so slightly. Dubious.

Now just have a problem with crashes when trying to send and the federate is overloaded. The pool rejects the submission of a job and this exception bubbles up into an RTIinternalError which kills execution. This looks like openlvc/portico#63.

timpokorny commented 9 years ago

Have identified the root problem in openlvc/portico#63 and patched. Need to commit updated Portico jar with the change.

openlvc / hperf