Open PeterFidelman opened 3 years ago
Great suggestion! I really like the capability to track in an app, allows for easy extension/customization with a basic capability of last/max/min/average timing with reset/clear and deadline overrun notification. Easy from the cFE side since it's just adding SB messages or an event. I've utilized similar patterns on previous projects (non-cFS) with great success (extremely useful for maintaining performance during development and tracking margins vs deadlines).
IIRC I also did frame relative start/stop time tracking (last, max, min, average). This was a very timing sensitive project w/ tight deadlines and impacts based on integration times in a detector so any shifts had real impacts on data.
Agreed, great enhancement. Should be able to do all underneath the cFE "hood" so apps are not aware and require no modifications.
This seems like an extension of the ES performance log to me. It's already there, does (or could be used to do) most of what is described here. Perhaps just a couple things are missing:
All,
Thank you for the responses! I agree there are multiple ways of implementing this feature, each with its own benefits and drawbacks. @jwilmot's suggestion of making the performance monitoring work "underneath the hood", without special changes to applications, is exactly what I was going for. @jphickey's suggestion of using the performance log is also an interesting place to put the feature, so long as logging "indefinitely" isn't going to cause performance or timing issues of its own.
I'll watch this space to see if the issue gains traction. Maybe I'll get some time to work on a patch myself, but I'm not actually sure exactly when I need this feature. If that day comes, I will submit a patch. However, it's possible that some other project will "beat me to it"! I think it would be useful for anyone who wants to use cFS in a timing-critical use case.
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
Describe alternatives you've considered
Here are some ways that cFE (and cFS) can measure application performance today:
As a rule, existing methods are limited in that either (1) they do not track detailed timing information, or (2) they require application authors to manually instrument their cFS app and thus are not supported for all apps.
Additional context
Any solution must take into account the fact that a typical cFS application spends a lot of time idle, waiting for Software Bus messages. This means that simply instrumenting the CFE_ES_Runloop() function won't give an accurate sense of how much CPU time is being consumed by even a simple application such as the SAMPLE_APP.
There are many possible solutions. My suggestion is to make CFE_ES_Runloop() fire either an Event or a SB message signaling that each application has reached the top of its main loop (i.e., finished executing). Because application execution is normally triggered by a wakeup message as well, comparing the timing of the two messages allows measurement of application execution time.
A "statistics tracking" application could subscribe to both messages, compare their timings, and calculate/report any statistics desired, such as last, average, and max observed run time. Outsourcing calculations to an app means they can be easily customized or disabled per mission without modifying cFE.
Side benefit: deadlines
I am fond of this particular implementation because it easily enables another feature: application deadlines. A deadline is an execution time bound triggering a configurable action. It can also be thought of as a "software watchdog". Deadlines are important because they allow unexpectedly long-running applications to be rapidly detected and can help mitigate the timing impact of such applications on the rest of the system.
Today, the closest analogous feature is HS (Health & Safety) Application Monitoring of the ES Task Execution Counter. This only detects applications that get "stuck" for a long time. Also, HS only monitors counters for liveness and does not check that they are incrementing at the expected rate.
Here is my suggested way to implement deadlines. The Scheduler (SCH) application assigns each scheduled app a deadline of configurable length L. If SCH sends the application a wakeup message at time T, it will expect to receive the application's RunLoop() message by time T+L. When the deadline is reached, if the application is not done, SCH fires a schedule overrun event. The event can be caught and used by HS (Health & Safety) or some other application.
Note: I've presented a lot of detail here. I'm not tied to any of the details. My goal is to present a starting point for further discussion of whether these features are useful, and for any resulting implementation to be consistent with cFS's architecture.
Requester Info
Peter Fidelman - Blue Origin
These ideas were originally presented during a talk at Flight Software Workshop 2021 (slides).