nasa / cFE

The Core Flight System (cFS) Core Flight Executive (cFE)
Apache License 2.0
409 stars 202 forks source link

SB memory pool misleading reporting and ES API confusing #2558

Open skliper opened 4 months ago

skliper commented 4 months ago

Describe the bug SB isn't reporting the peak and current memory pool use correctly.

To Reproduce Inspect MemInUse and PeakMemInUse from SB (or UnmarkedMem which is max size - the peak in use) and compare to what's reported from ES from a status call in NumFreeBytes.

For example I just ran a test with the following results:

MemInUse = 30739
PeakMemInUse = 35418
Where the max size is set to 524288
But as reported from ES the NumFreeBytes is 459824

So the peak of blocks that were ever allocated is almost double the SB reported peak at 64464.

CFE_ES_GetPoolBufInfo claims to return "size of the buffer", but it returns ActualSize, not BlockSize. This could just be user confusion, but then CFE_ES_GetPoolGetBlockSize is the underlying call, and it also returns ActualSize,,not BlockSize.

I'd think the common use case of tracking the peak/unmarked memory in the pool is to manage margin. Since SB is just reporting a sum of the ActualSize's (aka the requested sizes) and not the actual used block sizes the real margin could be significantly less than what's implied by the SB reporting.

Expected behavior There doesn't seem to be an API to show the peak for actual memory in use from ES, it's just the unmarked pool memory so I don't know that there's a way to really show exactly total used w/ current APIs but providing back the unmarked amount of memory seems more useful.

Code snips SB tracking: https://github.com/nasa/cFE/blob/28a58203a56ed7c1512c79c961fadeddb5bbb7bb/modules/sb/fsw/src/cfe_sb_buf.c#L119-L125

CFE_ES_GetPoolBufInfo calls CFE_ES_GenPoolGetBlockSize: https://github.com/nasa/cFE/blob/28a58203a56ed7c1512c79c961fadeddb5bbb7bb/modules/es/fsw/src/cfe_es_mempool.c#L502

But CFE_ES_GenPoolGetBlockSize returns the actual size: https://github.com/nasa/cFE/blob/28a58203a56ed7c1512c79c961fadeddb5bbb7bb/modules/es/fsw/src/cfe_es_generic_pool.c#L419

System observed on:

Additional context NA

Reporter Info Jacob Hageman - NASA/GSFC

jphickey commented 4 months ago

I concur that the TLM stats reported for pool usage is not great. They are hard to interpret and the common problems (i.e. fragmentation) are not apparent by looking at the numbers reported.