tillitis / tillitis-key1

Board designs, FPGA verilog, firmware for TKey, the flexible and open USB security key 🔑
https://www.tillitis.se
382 stars 24 forks source link

Framing protocol: more expressive error responses possible #228

Open cobratbq opened 1 month ago

cobratbq commented 1 month ago

I want to put this information out there, maybe it is too late and it is understable if you close the issue. I just want to make sure you are aware that the possibilities exist within the room offered by the framing protocol.

With the fields that are available in the framing protocol, it seems to be possible for more granular, expressive bad/error response messages. These suggestions avoid the need for magical values (all zeroes, all ones, etc.) or patterns, simply because the frame-header can be sufficiently expressive without.

These are the possibilities (cases) I distinguished:

  1. ResponseNotOK + Command-0 + Length-1 to indicate problem with header: non-existent command, no actual body, so ResponseNotOK can only refer to parts from frame-header: DST, forbidden bits, etc.
    This allows responding to misdirected firmware/software commands, such that ignoring the message is not necessary. Ignoring the message is problematic because you expect the client-software to draw their own conclusions from that, in addition to having to wait for a time-out.

  2. ResponseNotOK + <command> + Length-1 to indicate problem with command: no actual body, so unsupported/unknown/illegal command. One can discuss whether "unknown/unsupported" and "wrong program state for command" should be considered same.
    This allows you to indicate that the command itself is the problem (at "this" time).

  3. ResponseNotOK + <command> + Length-{4,32,128} to indicate problem with command-body: even if the return body is irrelevant (e.g. all-zeroes), the fact that Length-1 was not chosen can function as indicator that the body rather than the command is the problem. One can even distinguish sub-cases:

    1. the correct body-length: (likely) indicating the expected response is known.
    2. another, incorrect body-length: (likely) indicating that it had different expectations for the response. (It's up to the client-application on how to interpret that. Which will likely be different based on complexity, strictness of the software involved.)

As a personal opinion, I am not too fond of using a single pool of ID-values for commands and responses. It doesn't seem realistic to consider both ends peers in capabilities/requests. So why not have a command 1 respond with a response 1? (And other IDs if there are more responses, or reusable responses.)

An additional problem with separate command and response IDs: _if some application does not know the (firmware) command, how to respond if the corresponding response ID needs to be supplied_? In that light, with the error-communication as outlined above, any program can sensibly respond to any request.

I have not found a good reason why, possibly in light of error-communication, different response-IDs are necessary. They don't hurt. However, I have not found added benefits.

edit this also should remove the need for juggling with the timeout values, as it is possible to respond with a sensible error response. (See https://github.com/tillitis/tkeyclient/pull/9)

dehanj commented 3 weeks ago

Thanks for reaching out and engaging with us.

This is an interesting topic and I would say that it is almost already possible to start to use. When reading through our Framing Protocol we don't really say how to handle errors, so I would say that your suggestion is within the realm of what is "considered allowed". The framing protocol actually lets the user handle the errors as one wants, as long as the fields are correct - which I would say your take above seems to be.

I looked through our implementation of the protocol in tkeyclient, and I can see that we do read the entire packet if the response is ResponseNotOK, but unfortunately we do not return the actual payload - we return nil, header and error instead. I would say that we should change it to return the payload, the header and the error so it is up to the application to parse and handle the error instead of making decisions for the application. This will create more flexibility within the scope of the framing protocol. I will create an issue in tkeyclient for this.

So in general, if I'm not missing something, I would not say that it is to late and that one are free to use it like above. We do not use it like this in our applications at the moment, but most of them so far are quite small that any error simply kills the application, or it aborts the current command and waits for the next (depending on error).

Are you using the errors, like you described above, in your applications? Oh, and one general thing. We don't even dictate that an app have to use our framing protocol after an app is loaded onto the TKey and started. It is actually free to use whatever afterwards. We are however recommending to use the same protocol, since compatibility is easier to manage.

In the long run we are looking into if we should use something completely different, like protobuf or similar. Where we can have a config that spans over multiple applications, and that handles the packing of packages for you. But it is a discussion that has barely started.

cobratbq commented 2 weeks ago

I looked through our implementation of the protocol in tkeyclient, and I can see that we do read the entire packet if the response is ResponseNotOK, but unfortunately we do not return the actual payload - we return nil, header and error instead.

This is interesting, because I think the documentation states that it should return ResponseNotOK with an all-zeroes (excl. command-byte) payload body. The firmware also behaves slightly differently.

FW_RSP_NAME_VERSION: In a bad response the fields will be all zeros (0x00 bytes).

I looked into this because I was looking for a way to respond appropriately to firmware-destined requests, such that the client can discover that TKey isn't in firmware. Unfortunately, the current solution for this error-case is that of not responding. This gives the complication that one has to work with a read-timeout, and consequently "guessing that data will not arrive".

Problems that cannot be expressed, AFAICT:

  1. Firmware will not allow application-requests. (It could respond with ResponseNotOK, as stated in my proposal.)
  2. Programs do not know how to respond to firmware-requests. (Your programs do not respond at all, which works but requires working with time-outs as there is no event to respond to.)
  3. There is no convention that allows responding to unknown requests. (Response is expected to carry corresponding responseID, although apparently this is later dropped too.) Arguably, any response after a request could be causated to that request, but that fails when multiple frames are sent.

Note: As was stated in another issue already, the App name and version cannot be blindly trusted, and given that I have other ways to validate the claim. The client will still use App name and version to determine if program is expected and if so to determine which API to use.

Are you using the errors, like you described above, in your applications?

Yes. I am exploring what the most suitable mechanisms are, hence this issue report. There are other considerations not immediately relevant here. I noticed that the mechanisms already provided as part of the header/protocol are not embraced to their full potential.

We do not use it like this in our applications at the moment, but most of them so far are quite small that any error simply kills the application [..]

Understandable. That would be my preference if I could. This program is a bit bigger, and I'd like the client to be able to adjust if needed.

We don't even dictate that an app have to use our framing protocol after an app is loaded onto the TKey and started.

I realized that. I'm reaching out to make you aware of the possibilities within your chosen direction.

In the long run we are looking into if we should use something completely different, like protobuf or similar.

This seems nice. Maybe also very ambitious at the same time. I don't know what the reasonable capacities are. For example, I assume there is a reason that you selected a fairly small 1+128 byte frame. You'd probably need a bit more throughput if a design is chosen with more overhead. (I don't know protobufs well, but given that the corresponding schema is expected to be known, I would think there is fairly little overhead.) Also, that would require protobuf support in the TKey-program, so affects the program size.

So far, I find that I can do quite a bit with the basic mechanisms that are in place now.

dehanj commented 1 week ago

Yes, you are correct that the firmware never will respond with a NOK. I think this might be that the framing protocol was written a while back and the firmware has most likely been re-written, or at least updated along the way and hence they are not perfectly aligned. But I'm not exactly sure, they both existed before I got involved.

To answer to the problems you list:

  1. In the firmware today, it is written to be extremely strict and not allow undefined commands. It interpret other commands as suspicious and will go to a failed state. Probably not always 100% desirable. I could agree on that a command destined for an app could have been responded with an NOK, since it won´t parse the data anyhow.

  2. That is not completely true, as an example the tkey-random-generator responds with an NOK to requests aimed for firmware. Which I would claim is the idea, and even if the tkey-device-signer does not do it today, I think it should..

  3. I would say NOK works for this as well. Since you don't know anything about that request that is the most sensible thing you can do. What do you mean with "Response is expected to carry corresponding responseID, although apparently this is later dropped too."? There is two mechanisms for correlating a response to a request. First the command ID and the response ID, and if you are sending multiple frames with the same command/response ID you should use the frameID tag. We have an experimental implementation where we send mutliple frames at a time, before we get a response.

But I do agree with you in the command/response ID, I'm not a fan of having command and response IDs, from my point of view they should be the same and then let the command payload handle the variations if there would be any.

I think one of the reasons for 128+1 size is stability in the original usb-to-serial converter, which is fixed since a while back. This was again before my time here, but I do think sizes up to 512 bytes was experimented with.

Anyhow, all breaking changes to the framing protocol is hard to introduce, but we do appreciate your input.

mchack-work commented 1 week ago

I may be misunderstanding something you're talking about but tkey-device-signer does respond to a firmware probe here:

https://github.com/tillitis/tkey-device-signer/blob/main/signer/main.c#L371

If you send something with the byte header endpoint set to 2 (that is, firmware), it replies NOK.

This is also what we rely on in for example tkey-ssh-agent to probe for firmware. If we do get a NOK, it will try to figure out if it's the right device app that is running by asking it for a name and version. Not at all secure, but better than nothing.

Oh, and yes, the small sizes and the command/respone on every packet is a legacy from when the communication wasn't very reliable. It has been hard to change the protocol since, of course. It will probably happen, as dehanj mentions, and probably using something like Protobuf, Cap'n Proto, CBOR or something else that is easy to generate code for. If we can fit it within the size requirements.

mchack-work commented 1 week ago

Regarding the firmware probe I added some proposed changes to the Dev Handbook:

https://github.com/tillitis/dev-tillitis/pull/50

cobratbq commented 1 week ago

I may be misunderstanding something you're talking about but tkey-device-signer does respond to a firmware probe here:

No, you aren't misunderstanding. Your counter-examples are perfectly on point. I remember finding a case, first place I looked, that didn't respond. I will try to find it again. Hmmm... actually, I think I looked at the firmware returning -1 without a ResponseNOK message. But then, as you mentioned, the firmware is very strict.

cobratbq commented 1 week ago

3. I would say NOK works for this as well. Since you don't know anything about that request that is the most sensible thing you can do.

Okay, so what I meant is: a request with command ID 1 expects a response with command ID 2, i.e. one incremented. That's merely a convention. Because it is by convention, there is a assumption if you respond to an unknown command: ResponseNOK for command-byte + 1. (I think you also confirmed that request 1 with response 1 is more convenient.) That's why I pointed it out.

What do you mean with "Response is expected to carry corresponding responseID, although apparently this is later dropped too."?

This just referred to the bug mentioned earlier in this issue, that the response body was not returned if ResponseNOK. No worries, this was merely intended as summary.

There is two mechanisms for correlating a response to a request. First the command ID and the response ID, and if you are sending multiple frames with the same command/response ID you should use the frameID tag.

This is interesting: the way that I know how frame-IDs are used, is to distinguish between multiple (parallel) in-flight requests. So I would only use frame-ID for concurrency (concurrent design), so either: parallel request handling, or to handle a secondary request before the first request can be concluded.

For example:

  1. receive Req1 with frame-ID 0, do some initial handling .. waiting (but not blocking) to "complete",
  2. receive Req2 with frame-ID 1, handle Req2 and respond with frame-ID 1,
  3. now complete Req1 respond with frame-ID 0.

Consequently, from the perspective of a single frame-ID, interaction is strictly linear: request followed by corresponding response. This would allow concurrent interactions without confusing requests/responses of parallel in-flight requests. (Note: this is akin to request-multiplexing in HTTP/2 and QUIC.)

Now for me, given that -- at the moment -- I process one request at a time in blocking fashion, I constantly reuse the same frame-ID. No multi-threading, no concurrency. (Yet)

We have an experimental implementation where we send mutliple frames at a time, before we get a response.

I'll have a look there.


My current approach to larger requests, is the following: (i.e. the current experiment)

  1. In usual cases, i.e. unless otherwise specified, respond to request command-byte X with response command-byte X, e.g. respond to 3 with 3. This provides room for up to 255 (excl. 0) different commands.
  2. Use the most significant bit (0b10000000/0x80) to indicate follow-up frames. Meaning that if the bit is set, the frame cannot/should not be independently processed.

This still provides space for up to 127 distinct commands (excl. 0), and allows for generic logic to semantically recognize subsequent frames for same request. (So they can be dropped/ignored.) This is usually not necessary, i.e. if program knows to expect multiple frames, it can just read them. However, if an error occurs and program returns control flow to generic request-handling logic, then the logic knows which frames to skip that are unhandled frames of a previous (failed or otherwise) request.

Simple bit-masking is sufficient to read the flag or the command independently. It also means that large payloads can simply be cut up at the frame-length border and transmitted in multiple frames as we simply set the 0x80 bit after the initial frame.


I didn't mention this before, because I do not think this needs to be part of a protocol. This idea operates within the space offered by the protocol. I am elaborating this now, because of your mention of use of the frame-ID for which I seem to have a different purpose in mind.

dehanj commented 1 week ago

First, my mistake. I missed the NOK response already in the read_command(), i was looking in started_commands().

This is interesting: the way that I know how frame-IDs are used, is to distinguish between multiple (parallel) in-flight requests. So I would only use frame-ID for concurrency (concurrent design), so either: parallel request handling, or to handle a secondary request before the first request can be concluded.

That is correct, and is to some extent how we use it in the experiment as well (unless you are referring to multi threaded request handling which it is not). Basically, in the experiment it is using the same command for all request (loading an app), simply filling up the FIFO buffer on the TKey, so the Tkey can run at its "max capacity" and does not have to wait for the next request, it is already in the FIFO buffer.

1. client send cmdX, frame ID 0 
2. client send cmdX, frame ID 1
3. client send cmdX, frame ID 2
4. client send cmdX, frame ID 3
5. tkey process cmdX with frame ID 0, respond with cmdY, frame ID 0
6. client send cmdX, frame ID 0 (new request, reuse "empty frame ID")
7. tkey process cmdX with frame ID 1, respond with cmdY, frame ID 1
8. client send cmdX, frame id 1 
9. .. and so on until done ..

In this experiment it is still parsing the frames linearly, since that is how the implementation of loading an app in fw is (this was partly an experiment to confirm that the bug in the USB-to-serial converter was fixed, so changing fw was out of scope). But it does not need to be linearly, as long as the application has enough information in the payload (such as offset), then each frame could be completely independent, and hence be parsed in whatever order. The client would keep track of which frame is lost/gets an error for with the frame ID.

Now for me, given that -- at the moment -- I process one request at a time in blocking fashion, I constantly reuse the same frame-ID. No multi-threading, no concurrency. (Yet)

We do the same, reuse the same since we don't expect multiple in-flight packets.

Anyhow, my point was with the comment from the start that since there is a frame ID, if you don't know the proper response command, replying with a NOK with the same frame ID can be used to correlate it to the request.

I didn't mention this before, because I do not think this needs to be part of a protocol. This idea operates within the space offered by the protocol. I am elaborating this now, because of your mention of use of the frame-ID for which I seem to have a different purpose in mind.

Nice approach. Looks sound, we might take inspiration in some future application.

cobratbq commented 1 week ago

That is correct, and is to some extent how we use it in the experiment as well (unless you are referring to multi threaded request handling which it is not). Basically, in the experiment it is using the same command for all request (loading an app), simply filling up the FIFO buffer on the TKey, so the Tkey can run at its "max capacity" and does not have to wait for the next request, it is already in the FIFO buffer.

1. client send cmdX, frame ID 0 
2. client send cmdX, frame ID 1
3. client send cmdX, frame ID 2
4. client send cmdX, frame ID 3
5. tkey process cmdX with frame ID 0, respond with cmdY, frame ID 0
6. client send cmdX, frame ID 0 (new request, reuse "empty frame ID")
7. tkey process cmdX with frame ID 1, respond with cmdY, frame ID 1
8. client send cmdX, frame id 1 
9. .. and so on until done ..

Right, here it seems our approaches differ. You still respond to every frame, while I send all frames with FrameID 0 and only send a final response for the command-message as a whole. (So I send only 1 response for 1 request with possibly many follow-ups frames.) For me, a request is 1+ frames, even though the first byte of each frame-payload carries a command-byte.

I understand your use of the frame-ID now, because it allows you to send 4 frames and still receive and correlate the response for all of the them. (As you explain; I'm summarizing again.)

In this experiment it is still parsing the frames linearly, since that is how the implementation of loading an app in fw is (this was partly an experiment to confirm that the bug in the USB-to-serial converter was fixed, so changing fw was out of scope). But it does not need to be linearly, as long as the application has enough information in the payload (such as offset), then each frame could be completely independent, and hence be parsed in whatever order. The client would keep track of which frame is lost/gets an error for with the frame ID.

That's clear.

Now for me, given that -- at the moment -- I process one request at a time in blocking fashion, I constantly reuse the same frame-ID. No multi-threading, no concurrency. (Yet)

We do the same, reuse the same since we don't expect multiple in-flight packets.

Anyhow, my point was with the comment from the start that since there is a frame ID, if you don't know the proper response command, replying with a NOK with the same frame ID can be used to correlate it to the request.

Yup, got that. I guess my point is that:

FW_RSP_NAME_VERSION (0x02): [..] In a bad response the fields will be all zeros (0x00 bytes).

Isn't that meaningful anymore, right? This suggests responding with response-byte and NOK. Your solution to respond with NOK and use frame-ID to correlate to request, is sufficient to signal the error. So the response command-byte provides no extra meaning.

Maybe this isn't something for which follow-up discussion is useful. I pointed out these distinct error cases, as there seems to be some ambiguity and also room for more expressiveness, and you already explained that the protocol is a general indication. I understand your approach with using the frame-IDs for correlating individual requests to responses. That makes sense. As you mention, to some extent the direction is set. Thanks for clarifying.

dehanj commented 1 week ago

FW_RSP_NAME_VERSION (0x02): [..] In a bad response the fields will be all zeros (0x00 bytes).

Isn't that meaningful anymore, right? This suggests responding with response-byte and NOK. Your solution to respond with NOK and use frame-ID to correlate to request, is sufficient to signal the error. So the response command-byte provides no extra meaning.

You are right, and as you mentioned earlier firmware does behave slightly different than stated. I will see into if an overhaul of the protocol documentation is due, to minimize some ambiguity and make it clearer.

Sure, thanks for the discussion. We appreciate the feedback.