nasa / PSP

The Core Flight System (cFS) Platform Support Package (PSP)
Apache License 2.0
66 stars 56 forks source link

Add API for obtaining system health statistics #385

Closed jphickey closed 1 year ago

jphickey commented 1 year ago

Is your feature request related to a problem? Please describe. CFS apps (such as HS in particular) need to monitor and report the health of the system, in particular CPU usage. Unfortunately this info can vary wildly and there is no standardized way of getting it via POSIX or other OS APIs - it is generally only obtainable via platform-specific access methods such as the /proc filesystem on Linux.

Describe the solution you'd like Design an API that can obtain system health statistics. Initially this must support per-core CPU usage, but should be extendable to support arbitrary variables such as temperature, network+disk I/O stats, RAM+swap use, etc. Basically anything that is typically shown in a PC "health monitor" app.

Additional context Initially the CPU usage stats would allow nasa/HS#3, nasa/HS#4, and nasa/HS#85 to be resolved.

Requester Info Joseph Hickey, Vantage Systems, Inc.

skliper commented 1 year ago

Would it make sense to also consider adding APIs for task related info? Stack use, cpu use, cpu affinity, whatever else might be available on that OS?

skliper commented 1 year ago

I ask partly because there's cases where by design CPU use is 100%, but what's more important is how much CPU is being used by the lowest priority "background" task. And getting/monitoring stack use seems like it should be a requirement for complex embedded systems...

jphickey commented 1 year ago

Yes, I'm thinking to make it based on key/value pairs with values as floats or fixed-point ints. Keys could be arbitrary strings. Then we'd just publish a list of "common" key names for things like system CPU use or temperature. HS could attempt to read variable(s) based on key name, and if it doesn't exist, it would simply return not implemented/unknown. But for the ones that do exist on that platform, HS could report the value (or whatever).

This would make it easy to extend with as many platform-specific sensors as you need (e.g. disk temperature is a thing too, that some hardware has a sensor for, but some does not).

I think by adding a "scope" argument of some type, the same API could be applied to individual tasks/processes too - such as reading the RAM use or CPU use of a particular task.

jphickey commented 1 year ago

Or maybe have a set of variables named something like "/" for that type of stuff....?