nasa / SBN

38 stars 23 forks source link

track CCSDS TLM secondary header byte order #19

Open CDKnightNASA opened 4 years ago

CDKnightNASA commented 4 years ago

There has been discussion at the cFS CCB that the secondary header will be standardized as big-endian aka "network order". This has somewhat been implemented for CMD secondary headers (see #17) but has not, to my knowledge, been standardized/implemented for TLM secondary headers.

Need to track this as, at some point, SwapCCSDS will be completely deprecated.

skliper commented 4 years ago

Is it possible to identify what API's you would need for SBN to not need to directly access to the CCSDS header? One of the goals for cFE is to provide the abstraction such that SBN will be portable across various implementations (different headers/MsgId schemes/etc.)

@jphickey

jwilmot commented 4 years ago

Shouldn't CFE_SB_GetMsgId and CFE_SB_GetTotalMsgLength be the APIs? What else would SBN need?

jwilmot commented 4 years ago

Looking at that again, all the header access APIs should be in cfe_sb_msg_id_util.c. Those are the things that change if the underlying format changes. cfe_sb_msg_id_util.c should maybe have been named cfe_sb_msg_access_util.c or cfe_sb_msg_abstraction_util.c

CDKnightNASA commented 4 years ago

So the problem is this--if any CCSDS headers are in platform byte order, SBN needs to swap bytes around. Until that's no longer the case, SBN will have to poke directly at the header and can't really do this via an API, unless the API had a "get/set the bytes that represent the timestamp", which wouldn't be sensible/helpful to anything but SBN.

jphickey commented 4 years ago

Can't SBN use CFE_SB_GetMsgTime and CFE_SB_SetMsgTime functions? (maybe I don't fully understand the issue).

CDKnightNASA commented 4 years ago

Can't SBN use CFE_SB_GetMsgTime and CFE_SB_SetMsgTime functions? (maybe I don't fully understand the issue).

Can't use Get/SetMsgTime 'cause it'll get/set in platform-endian form. The problem is, with SBN, the order needs to be swapped when connecting a big-endian and little-endian system together. (This is also true of the payload, but that's not the job of SBN.)

jphickey commented 4 years ago

Does any mission currently use SBN in a mixed-endian environment?

I would think this would be a non-starter, because the payload of all packets is incompatible. Unless it only forwards it through on on the other node... in which case, does it need to look at the TLM header? I would think just the primary header would be enough for routing.

jphickey commented 4 years ago

I guess I don't understand why the order of TLM headers matters for SBN... it should just pass it through. Someday the framework might fix the byte order of TLM like it does for CMD, or the framework might actually do something that actually fixes the root problem (EDS), But either way wouldn't SBN just pass through?

CDKnightNASA commented 4 years ago

CCSDS TLM timestamp created on little-endian and read on big-endian would be garbled. Once the framework standardizes on big-endian, this problem goes away for SBN.

jwilmot commented 4 years ago

Why is SBN looking at time? And yes everything in the Pri and Sec header is to be network byte order including time. EDS is the answer for payload swapping when needed.

CDKnightNASA commented 4 years ago

Why is SBN looking at time? And yes everything in the Pri and Sec header is to be network byte order including time. EDS is the answer for payload swapping when needed.

SBN is only "looking" at time to "correct" it to the platform-endian format of the system so that anybody reading the messages off the local SB will see the correct timestamp no matter what endian-format the sender sent it as.

jwilmot commented 4 years ago

Ok. I wonder then if CFE_SB_GetMsgTime and CFE_SB_SetMsgTime should be fixed? If the message is always big and the local system needs to set or get than maybe CFE_SB_GetMsgTime and CFE_SB_SetMsgTime should do it. SBN does it to every message. CFE_SB_GetMsgTime and CFE_SB_SetMsgTime need only do it when called.

jphickey commented 4 years ago

SBN is only "looking" at time to "correct" it to the platform-endian format of the system so that anybody reading the messages off the local SB will see the correct timestamp no matter what endian-format the sender sent it as.

Seems like an unnecessary feature to me -- why would a subscriber look at the TLM header without looking at the payload? Or put another way - since the payload is always unreadable, what is the objective of making the TLM header readable? Any subscribers would be equally broken without the payload being readable.

I had noted in a comment as part of the nasa/cfe#297 discussion that any change to "fix" the CMD headers to a specific byte order should also be applied to the TLM header at the same time, but it was shot down. See https://github.com/nasa/cFE/issues/297#issuecomment-595435305. Now the build is "half and half" (CMD is always network byte order, where TLM remains native byte order as it was). I thought there was an new issue about this, but I can't find one.

jphickey commented 4 years ago

Note - submitted nasa/cFE#628 to complete the work started in nasa/cfe#297

skliper commented 4 years ago

See https://github.com/nasa/cFE/issues/92... still pending the definition of a standard that we can apply. There's multiple secondary header definitions and if I recall there was quite a bit of discussion as to what "secondary" header actually meant, the time field can be redefined, etc. We need a standard we can apply.

jphickey commented 4 years ago

What document does the graphic in this comment come from: https://github.com/nasa/cFE/issues/297#issuecomment-595294478

Does this document also define the TLM header in a similar means?

CDKnightNASA commented 4 years ago

SBN is only "looking" at time to "correct" it to the platform-endian format of the system so that anybody reading the messages off the local SB will see the correct timestamp no matter what endian-format the sender sent it as.

Seems like an unnecessary feature to me -- why would a subscriber look at the TLM header without looking at the payload? Or put another way - since the payload is always unreadable, what is the objective of making the TLM header readable? Any subscribers would be equally broken without the payload being readable.

Difference is we encourage users to use the CFE-provided macros/functions when reading CCSDS headers, but we do no such encouragement for reading the(ir) payload.

The point of my original ticket is that SBN currently byte-swaps the timestamp. Once CCSDS (and, therefore CFE, although the order may be reversed) standardizes on secondary headers being big-endian, SBN no longer needs to care about timestamps (in fact, it would be breaking things for SBN to manipulate timestamps.)

skliper commented 4 years ago

What document does the graphic in this comment come from: nasa/cFE#297 (comment)

Does this document also define the TLM header in a similar means?

I wish. Unfortunately I'm not aware of GSFC adopting a single "standard" TLM secondary header across missions that we can reference. There are mission standards, but they haven't consolidated around a single solution that I know of. I think that's why cFE still supports all the different CFE_MISSION_SB_PACKET_TIME_FORMAT options...

skliper commented 4 years ago

And they very likely don't define it as big endian on little endian missions, where-as the CMD secondary header is defined as endian agnostic...

jphickey commented 4 years ago

That is unfortunate -- but still, it would be fairly trivial just to make all the CFE_MISSION_SB_PACKET_TIME_FORMAT into network byte order, would it not? We would just have to update the SB API to read it in an endian-neutral way rather than a memcpy().

Obviously, the solution I would push for is an EDS, because it solves the payload problem as well as the header problem. But in the interim it seems like releasing a CFE with CMD in an fixed byte order without doing as similar fixed byte order for TLM would confuse users. The same argument of "no standard TLM format is defined" can be equally applied to justify why we did change it, as it can be applied to say why we didn't change it. That is, if missions failed to define an endianness of the header, then choosing network byte order wouldn't be wrong.

skliper commented 4 years ago

I completely agree it's trivial, there are macro's already defined in ccsds.h. The problem is we don't have a standard to apply, from my experience it has always been a mission option (typically LE missions don't report TLM secondary header time in BE). Missions do define it, and they define it differently. As soon as someone with the authority and customer concurrence to define a standard (or set of standard options) across our user base does so, I'll happily support applying it. The CMD secondary header was always (to my knowledge) defined endian agnostic and flipping it was a bug. I have no similar standard to apply for the TLM secondary header... there is no standard definition to apply until someone defines one which I have yet to see. Maybe it needs to be an option, who knows without a standard/requirement?

Basically I'm just saying someone needs to own this issue, get consensus across the community (with stakeholder's involved), and come up with a standard (or define the set of supported options, or define the API implemented in a library such that projects can re-implement it however they want, or whatever). I'm not in the habit of changing flight code that has cross-center and international impacts based on someone just saying a secondary header that has no defined standard has to be big endian.

jphickey commented 4 years ago

As far as I know it was only an internal GSFC document that had the CMD secondary header documented that way. There was no published/external standard for it. In the cross-center/agency/international user context there were tools written that correlate to how the CFE had defined this field (as a uint16 in FSW target byte order) which are now broken, but we likely won't see those complaints bug reports until the next CFE gets released. (again, CFE 6.4.0 did this and as soon as it got into an official release it had to be hotfixed back to the way it was, due to user pushback, and most of these users only take "official releases", it its likely the same confusion will happen here).

This was precisely the same argument I was previously making against changing the CMD secondary header, because there was an established userbase expecting it a certain way, that don't have whatever GSFC document claims to define it in an endian neutral way.

So given that we changed it anyway -- if we are going to force external users to adapt to a "standard" for the CMD header that is different than what they are used to (and a standard that they haven't seen if they are outside GSFC), why is it a problem to do the same for the TLM header?

From comments that @jwilmot has made I get the impression there is already some expectation that this header is in network byte order.

skliper commented 4 years ago

CMD 2ndary header endian dependency was a bug, reported multiple times by our major stakeholders, It's unfortunate this fix will have community impact, but it is what it is. TLM secondary header has no established standard for endianness that I know of that has been agreed to by our major stakeholders.