Address comments from Adam Roach

From https://datatracker.ietf.org/doc/draft-ietf-perc-srtp-ekt-diet/ballot/#adam-roach

§1:

EKT provides a way for an SRTP session participant, to securely transport its SRTP master key and current SRTP rollover counter to the other participants in the session.

Nit: "...participant to securely..."

Already fixed on master.

§4.1:

EKTMsgTypeExtension = %x03-FF

Shouldn't this be "%x01 / %x03-ff" ?

I think the idea here is to forbid the use of 0x01, treating it as a reserved value (due to legacy, pre-standard implementations).

SRTPMasterKeyLength = BYTE SRTPMasterKey = 1*256BYTE

I think this either needs to be "1*255BYTE", or we need text that explicitly indicates that an SRTPMasterKeyLength value of 0x00 means "256 bytes." Probably the former.

I think this is even further constrained by the fact that EKTCiphertext is limited to 256 bytes, and contains the SRTPMasterKeyLength, SRTPMasterKey, SSRC, and ROC (and is not compressed) -- which means the SRTPMasterKeyLength can't be more than (256 - 1 - 4 - 4 =) 247 bytes. So perhaps "1*247BYTE" is more appropriate?

I think you're looking in the wrong place, actually. There are two levels of constraint here:

len(EKTCiphertext) < 256 - len(Epoch) - len(SPI) len(SRTPMasterKey) < len(EKTCiphertext) - len(SSRC) - len(ROC) - len(SRTPMasterKeyLength) - Cipher.Overhead

So we can make the following bounds...

EKTCiphertext = 1251BYTE ; 255 - 2 - 2 SRTPMasterKey = 1242BYTE ; 251 - 1 - 4 - 4 - Overhead=0

... but the maximum SRTPMasterKey will never be maximized in practice because the crypto overhead will be non-zero. And we can't really nail it down here in the ABNF because different ciphers will have different overhead.

In any case, I have added the above constraints.

§4.2.1:

The creation of the EKTField MUST precede the normal SRTP packet processing.

Why? This seems unnecessary and unnecessarily complicated. If the order of operations has an impact on the bits on the wire (I don't see how it does?), then please include some explanatory text here that clarifies the reason for this constraint.

I agree. In fact, the natural order of operations is to form the SRTP packet, then compute and append the EKT tag.

I have made this change.

§4.2.1:

When a packet is sent with the ShortEKTField, the ShortEKFField is simply appended to the packet.

Nit: s/ShortEKFField/ShortEKTField/

Already fixed on master.

§4.2.1:

If the SSRC in the EKTPlaintext does not match the SSRC of the SRTP packet received, then all the information from this EKTPlaintext MUST be discarded and the following steps in this list are skipped.

I can see implementors easily interpreting this as requiring them to discard the RTP payload as well. If that's not the intention (I don't think it is), consider adding text like "The FullEKTField is removed from the packet then normal SRTP or SRTCP processing occurs."

I don't think this is unclear, but I have adapted the text to say that the implementation MAY continue normal SRTP/SRTCP processing.

§4.3:

Section 4.2.1 recommends that SRTP senders continue using an old key for some time after sending a new key in an EKT tag.

This is the first appearance of the phrase "EKT tag," which never seems to be properly defined. I presume this is meant to be the combination of the EKT Ciphertext and the SPI?

In any case, please clearly define this term somewhere, preferably before using it the first time.

I have added a definition in {{EKT}}.

§4.3:

cannot be used and they also need to create a counter that keeps track of how many times the key has been used to encrypt data to ensure it does not exceed the T value for that cipher (see ).

The parenthetical phrase appears to be missing something here.

If either of these limits are exceeded, the key can no longer be used

Nit: "...either... is exceeded..."

for encryption. At this point implementation need to either use the

Nit: "...implementations need..."

Already fixed on master.

§4.5:

If a source has its EKTKey changed by the key management, it MUST also change its SRTP master key

I suppose it's not terribly important for interop, but the implication that this change takes place immediately seems to contradict the 250 ms period specified in §4.2.1. Perhaps a few words here about how these two normative statements are intended to interact would save implementors a bit of grief.

The word "immediately" appears nowhere here. The sequencing seems clear: Receipt of a new EKTKey should trigger an update to the SRTP Master Key, but you don't start using that for 250ms.

I have edited to clarify.

§4.6:

This document defines the use of EKT with SRTP. Its use with SRTCP would be similar, but is reserved for a future specification.

After reading this far, I was quite surprised to find this qualification. If this is the intention for this document, please adjust the rest of the text to match. Some examples follow.

The following shows the syntax of the EKTField expressed in ABNF [RFC5234]. The EKTField is added to the end of an SRTP or SRTCP packet.

Rollover Counter (ROC): On the sender side, this is set to the current value of the SRTP rollover counter in the SRTP/SRTCP context associated with the SSRC in the SRTP or SRTCP packet.

The final byte is checked to determine which EKT format is in use. When an SRTP or SRTCP packet contains a ShortEKTField, the ShortEKTField is removed from the packet then normal SRTP or SRTCP processing occurs.

The reason for using the last byte of the packet to indicate the type is that the length of the SRTP or SRTCP part is not known until the decryption has occurred.

At this point, EKT processing has successfully completed, and the normal SRTP or SRTCP processing takes place.

This allows those peers to process EKT keying material in SRTP (or SRTCP) and retrieve the embedded SRTP keying material.

I have removed the remaining mentions of SRTCP.

§4.7:
To accommodate packet loss, it is
RECOMMENDED that three consecutive packets contain the
FullEKTField be transmitted.
Nit: "...containing..." (alternately, remove "be transmitted" -- both make a grammatically correct sentance)

Edited to: "it is RECOMMENDED that the FullEKTField be transmitted in three consecutive packets"

More substantially -- under "New sender:", I'm a little surprised that there isn't any mention of other senders re-keying in response to a new sender joining. In the vast majority of conferences, when a sender joins, that same entity generally will also be a receiver. It seems this should trigger other senders to include the key in their next packet.

This is covered by the "new receiver" and "rekey" cases.

I have left this as-is.

§4.7:

Rekey: By sending EKT tag over SRTP, the rekeying event shares fate with the SRTP packets protected with that new SRTP master key.

Is this actually true? Going back to the 250 ms period specified in §4.2.1, it seems that the master key is sent out in packets pretty far removed from those it actually protects.

Between this and the inconsistency I mention in §4.5 above, this increasingly feels like maybe there were two different ways of reasoning about the timing of sending a master key versus the timing of actually using it. Does the text in §4.2.1 perhaps represent an outdated notion of how this is intended to work?

As noted above, the statement in §4.5 is not inconsistent. And your notion of "fate sharing" is pretty strict here. The intent is that the keying packets are following the same general path (in terms of media servers, etc.) as the media. It's not a perfect match.

I have left this as-is.

§4.7:
If sending audio and video, the RECOMMENDED
frequency is the same as the rate of intra coded video frames.  If
only sending audio, the RECOMMENDED frequency is every 100ms.
Is this "100ms" correct? Assuming, say, the use of Opus at voice quality with 20 ms packets, this is taking packets on the order of 40 bytes in length and tacking on something like 20 to 30 bytes to every fifth packet. That's an increase in overall stream size on the order of roughly 15% to 20%.

At the same time, when using real-time video, intra frames are going to happen roughly every 500 ms to 1500 ms. If a cadence on that order is okay for audiovisual streams, I have to imagine it's okay for audio streams.

So, to clarify: is this "100ms" a typo for "1000 ms"?

I don't know the answer on the merits, but it's an application decision anyway. I have added a note that applications will need to decide what latency/bandwidth tradeoff they want to make.

§7.2:

             +----------+-------+---------------+
             | Name     | Value | Specification |
             +----------+-------+---------------+
             | AESKW128 |     1 | RFCAAAA       |
             | AESKW256 |     2 | RFCAAAA       |
             | Reserved |   255 | RFCAAAA       |
             +----------+-------+---------------+

                   Table 3: EKT Cipher Types

Section 5.2.1 reserves "0" as well. I suspect we want to replicate that reservation in this table

On master, this table has been shifted down by one, so this change is no longer needed.

percwg / perc-wg

Address comments from Adam Roach #180

The following shows the syntax of the EKTField expressed in ABNF [RFC5234]. The EKTField is added to the end of an SRTP or SRTCP packet.

Rollover Counter (ROC): On the sender side, this is set to the current value of the SRTP rollover counter in the SRTP/SRTCP context associated with the SSRC in the SRTP or SRTCP packet.

The final byte is checked to determine which EKT format is in use. When an SRTP or SRTCP packet contains a ShortEKTField, the ShortEKTField is removed from the packet then normal SRTP or SRTCP processing occurs.

The reason for using the last byte of the packet to indicate the type is that the length of the SRTP or SRTCP part is not known until the decryption has occurred.

At this point, EKT processing has successfully completed, and the normal SRTP or SRTCP processing takes place.