openlvc / hperf

HLA Performance Testing Federate for IEEE-1516 (2010)
7 stars 3 forks source link

Port federate code to use the immediate callback facilities rather than evoked #5

Closed timpokorny closed 9 years ago

timpokorny commented 9 years ago

Story

{As a} developer of an open source RTI, {I want} my benchmarking suite to run as fast as humanly possible, {So that} I can test the limits of the RTI, not the benchmarking application

Context

The HLA defines a series of asynchronous services. In 1516e federates can control the way that they receive callbacks. Previously, you had to call tick() or evokeXxxCallbacks() to signal to the RTI when you were ready to process waiting messages. As of 1516e you can enable an "immediate" callback mode, which essentially delivers callbacks as soon as they are ready (done on a separate thread).

Part of the problem with the tick() style services is that you have very little control over how long you tie up the process for. The provided facilities in 1516e are:

The issue here is that when calling these methods I may block if there is no work to do, and I have to wait at least mintime for something to appear. This is time I could be using to do other work. In many circumstances this may not be important, but when you're working on a application that you want to push out information as fast as possible, the use of these services introduces an arbitrary delay. Playing with the loop-wait argument that the test federate takes lets you see this in action. The higher the value, the slower the throughput.

Why not just blat everything out and not tick until the end!? No - idiot - you'll fill up your queue with messages from other federates and run out of memory. The queues inside an LRC need tending, and the tick()/evoke() calls as the formal handing over of power to the LRC to do them. Also, it's not really indicative of "real world" use.

So, what would be better is to use the immediate callback facilities that are available. In Portico - and I suspect every other RTI - this basically means that a background thread is running constantly, tending to the incoming messages and invoking calls on the FederateAmbassador. You no longer need to call the tick/evoke methods, and nor do you need to worry about handing over time to queue-tend to ensure it doesn't grow to big. It's just done for you.

Ramifications? You can get FederateAmbassador calls at any time - including when you're right in the middle of doing other work, so be careful about any shared state.

It would be nice as part of this ticket to maintain the ability to use evoked callbacks if a particular command line argument was given. This would let us contract performance under each style.

Thus ends this completely unnecessary and lengthy commentary on a ticket that could have read "move from evoked to immediate callbacks"

Acceptance Criteria

Once complete:

Remaining Tasks:

timpokorny commented 9 years ago

Updated added and merged over to master as part of PR #6.

Updated the task list in the main issue with the remaining tasks.

timpokorny commented 9 years ago

Fixed problem with federate hanging on resign. One small code change to disconnect on the part of the test federate as well as a patched Portico file.

"Removed" the slowcode code for by taking the sleep away. It seems that just doing the modulo operation each tick is enough to ease the pressure ever so slightly. Dubious.

Now just have a problem with crashes when trying to send and the federate is overloaded. The pool rejects the submission of a job and this exception bubbles up into an RTIinternalError which kills execution. This looks like openlvc/portico#63.

timpokorny commented 9 years ago

Have identified the root problem in openlvc/portico#63 and patched. Need to commit updated Portico jar with the change.