Closed martinthomson closed 8 months ago
Thanks, these are good points. Some quick reactions:
The section is silent that it is application responcibility to negotiate a cipher suite for SFrame. And if media traffic using SFrame is bi-directional, then there may be different cipher suites for each direction. [... and likewise for versions]
Yes, this is accurate. Much like media codecs, SFrame is intended to be used in environments where there's a bunch of signaling that's been done already. We already assume that the application sets up keys, and I agree we should be explicit that the app needs to choose the version / ciphersuite as well.
Table 1 lists currently defined cipher sutes. While Nk, Nn, and Nt are defined above in section 4.4, Nh is only defined below the table, in section 4.5.1. In addition, this section also use Nka, which is not present in the table. All this adds confusion for readers.
Fair point. I will plan to update Table 1 to have an Nka
column (which is n/a
for some ciphers), and update section 4.4 to introduce all the constants.
Section 6.2 describes potential problem that may occur when a new participant joins a conference. It is not clear for me why this section only mentions video frames. Unless I'm missing something, the same situation may occur with other media frames (e.g. audio), so explicitly mentioning only viseo frames adds confusion.
The difference with audio is that in an audio stream, each frame contains enough data to decode the media and play it out. With video, the non-key frames are like diffs on the key frame, so the receiver can't use them unless it has the key frame. So if there's a key frame followed by 99 non-key frames, and the key roll-over happens on frame 5, then the receiver can decrypt frames 6..100, but can't use any of them. Whereas with audio, those frames could all be used immediately.
I also wonder whether SFrame needs so many reserved IANA codepoints. The draft allocates 5 codepoints out of 2^16 space. ...
The counter-argument here is that there's benefit in not having to worry about code point exhaustion. For example, TLS has benefitted from its large ciphersuite space to do things like GREASE. Here, it could allow us to do something like: Accommodate an application that really needs 1-byte code points (or shorter) by reserving all the code points with a specific high-order bits, and letting the application use code points in that space. There are also more complicated, varint-like approaches here, but I'm inclined to keep it as is.
Valery Smyslov had some good feedback on the document.
Issues: I have few issues with this document that are easy to address. Section 9 describes Application Responsibilities for using SFrame. I think that it lacks some important things (that might look obvious, but still need to be spelled out, IMHO).
The section is silent that it is application responcibility to negotiate a cipher suite for SFrame. And if media traffic using SFrame is bi-directional, then there may be different cipher suites for each direction.
SFrame itself doesn't have any versioning. Despite that it is very simple, protocols tend to evolve, so there is no guranteee that SFrame v2 will never appear. Thus, it is application responcibility to negotiate use of a concrete version of SFrame. This can be done by various means, application developers should at least take this into account.
Nits:
Table 1 lists currently defined cipher sutes. While Nk, Nn, and Nt are defined above in section 4.4, Nh is only defined below the table, in section 4.5.1. In addition, this section also use Nka, which is not present in the table. All this adds confusion for readers.
Section 6.2 describes potential problem that may occur when a new participant joins a conference. It is not clear for me why this section only mentions video frames. Unless I'm missing something, the same situation may occur with other media frames (e.g. audio), so explicitly mentioning only viseo frames adds confusion.
Considerations: I also wonder whether SFrame needs so many reserved IANA codepoints. The draft allocates 5 codepoints out of 2^16 space. I can imagine that few tens more may be allocated in the future (even with fairly unrestrictive IANA policy as "Specification Required" is). Did you consider using 1-byte range for codepoints? I am mostly thinking about the use of SFrame in imaginated constrained protocols, where ciphersuites might be represented in 1-byte. This is just some considerations that might be worth to think about.