Open igorluppi opened 2 years ago
Possibly related: https://github.com/nasa/cFE/issues/701, https://github.com/nasa/osal/issues/470 and the fix in https://github.com/nasa/osal/pull/472. Although this change was for binary semaphores, may need a similar "fix" for a OS_MutSemTake.
The real problem is trying to create the mutex again when the APP is restarted.
In function OS_MutSemCreate
:
return_code = OS_ObjectIdAllocateNew(LOCAL_OBJID_TYPE, sem_name, &local_id, &record);
The return_code
is not OS_SUCCESS
, its value is OS_ERR_NAME_TAKEN
, so basically as the APP did not close properly, the mutex still exists in cFE Table.
Any way can we get around that?
id_num
to delete it or recover it. Do we?There is OS_MutSemGetIdByName
... although that doesn't solve the underlying delete issue.
Really the app should clean up after exiting it's run loop that at minimum releases the mutex. See HS as an example (not of mutext cleanup, bun unregistering callbacks and closing child tasks). The CFE_ES_StopAppCmd should cause the CFE_ES_RunLoop to return false and allow the app to clean up.
Runloop: https://github.com/nasa/HS/blob/c5ef5acf65ca346270f721e22620e9321a930075/fsw/src/hs_app.c#L120 Cleanup call: https://github.com/nasa/HS/blob/c5ef5acf65ca346270f721e22620e9321a930075/fsw/src/hs_app.c#L184 Cleanup implementation: https://github.com/nasa/HS/blob/c5ef5acf65ca346270f721e22620e9321a930075/fsw/src/hs_custom.c#L138-L155
This approach is likely cleaner/safer than the possible side effects of OSAL trying to clean up and delete a locked mutex...
Basically you should never be able to return from the main app routine without unlocking the mutex. That's just bad manners.
count semaphore
with sem_initial_value = 1
works. It's a mutex but no error is raised on the release of the mutex if the releasing task is not its owner, as the true pthread_mutex
implementation does.
Moreover, the binary semaphore
not only locks in the same situation, but also deadlock cFE service as well, avoid that. I was not able to finish the software with SIGINT
(ctrl-c) after trying to stop an APP with a locked binary semaphore
.
Describe the bug An APP cannot be correctly stopped by CFE_ES if the
CFE_ES_StopAppCmd
is sent in the same moment that the APP has a mutex locked. Although the app is terminated (with errors), when we try to restart it, it doesn't work.To Reproduce Steps to reproduce the behavior:
OS_MutSemCreate
OS_MutSemTake
CFE_ES_StopAppCmd
using "SAMPLE_APP" as targetCFE_ES_CleanupObjectCallback: Call to OSAL Delete Object (ID:327691) failed. RC=-6
Expected behavior
System observed on:
Additional context An example:
Reporter Info Igor Luppi