nasa / SBN

38 stars 23 forks source link

Adding dynamic reloading and using scid for peer identification #42

Closed jbohren-hbr closed 3 years ago

jbohren-hbr commented 3 years ago

The main objective of this changeset is to support dynamic reconfiguration of SBN at runtime. In implementing this change, several other issues were identified and addressed. The scope of testing included only the UDP and Remap modules, so other modules still need to be updated to be compatible with small API changes (mostly involving adding Spacecraft ID). Testing was done on both x86 and ARMv7 architectures. No major changes were made to the SBN architecture. This has been built and tested on ground systems against cFS Integration Candidate 2021-01-19.

The contribution was split into as many mutually-exclusive commits as possible, to aide in reviewing. See commit messages in each commit for more details. Please let us how how we can best support integrating these changes.

Issues have been addressed include, but are not limited to:

Core SBN

  1. Default CFE_MISSION_SB_MAX_SB_MSG_SIZE would cause a stack overflow in SBN_RecvNetMsgs
  2. PollPeer() was no longer being called, so heartbeat / timeout functionalities in protocol modules were disabled
  3. Module-specific messages (such as SBN_UDP_DISCONN_MSG) caused the SBN_RecvNetTask to exit without un-setting Net->RecvTaskID, making re-connections impossible
  4. Peer identification was done only by ProcessorID and neither transmitted nor checked SpacecraftID
  5. ReloadTblCmd() improperly validated the notification from the TBL service as length CFE_SB_CMD_HDR_SIZE, but that message includes a 4-byte payload
  6. Functions in sbn_cmds.c use a uint8 buffer to cast as messages, which causes alignment problems on ARM
  7. Private data in SBN_PeerInterface_t and SBN_NetInterface_t was also subject to alignment issues on ARM
  8. SBN_POLL_TIME was still defined and referenced in documentation, but is no longer used
  9. SBN_SendNetMsg could return with a locked mutex
  10. SBN_ReloadConfTbl did not properly clean up resources when reloading

Remap

  1. No facilities existed to unload the filter table so that it could be reloaded

UDP

  1. Zero timeout in Recv() caused UDP tasks to busy-poll, resolved by https://github.com/CDKnightNASA/SBN/commit/89d79ceba4c6e74b056668344c1fc64666ca4dca (fixes #38)
  2. Peer identification was done only by ProcessorID and neither transmitted nor checked SpacecraftID
  3. Peer addresses were copied from address specification strings, but not null-terminated, which led to undefined behavior when peers were not on the same host
CDKnightNASA commented 3 years ago

Thank you very much for your contributions, all look great!