Benjamin Kaduk's ballot comments

huitema commented 2 years ago

Benjamin Kaduk has entered the following ballot position for draft-ietf-dprive-dnsoquic-10: Discuss

When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.)

Please refer to https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ for more information about how to handle DISCUSS and COMMENT positions.

The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-dprive-dnsoquic/

DISCUSS:

I have a 0-RTT-related topic that I'd like to discuss, as the current situation isn't entirely clear to me. In particular, TLS 1.3 provides (and QUIC inherits) a mechanism for a server to advertise that it just does not support 0-RTT at all, via the (absence of the) "early_data" extension. This meshes nicely with the guidance in RFC 8446 that 0-RTT is to only be used cautiously, and only with specific request from the application. However, this specificiation diverges from that requirement for application opt-in (per §9.1), and so when I read the directive in §5.5 that "servers MUST adopt one of the following behaviors", I am forced to wonder if the absence of a "abort the connection, because you do not enable early data at all" option is intended to forbid a server from taking that approach and thus require servers to implement and enable 0-RTT at runtime. I hope that the intent was just for the §5.5 listing to be predicated on the server using 0-RTT at all, but it's hard to reach that conclusion from the existing text, so I have to seek clarification.

COMMENT:

Thanks to Phillip Hallam-Baker for the secdir review. I did want to reiterate one of his comments, regarding the potential for harmful interaction between use of DoQ (or really, any encrypted DNS transport) and captive portals. While this would accordingly have been best placed in something generic to DNS privacy mechanisms, such as RFC 9076 or RFC 8932, I think there might still be room to mention it here. I could attempt to craft some text, if there is interest.

I made a pull request with some editorial suggestions at https://github.com/huitema/dnsoquic/pull/154

Section 1

The specific non-goals of this document are: [...]

No attempt to support server-initiated transactions, which are used only in DNS Stateful Operations (DSO) [RFC8490].

RFC 8490 is a proposed standard, so excluding it maybe is a bit in conflict with claiming that this is a "general-purpose transport for DNS", absent some other argument that DSO is a special-purpose tool.

Section 5.1.2

DoQ connections MUST NOT use UDP port 53. This recommendation against use of port 53 for DoQ is to avoid confusion between DoQ and the use of DNS over UDP [RFC1035].

Just to clarify: this prohibition is intended to apply even if there would otherwise be mutual agreement to use port 53?

Section 5.2

DNS traffic follows a simple pattern in which the client sends a query, and the server provides one or more responses (multiple responses can occur in zone transfers).

Is this true even for DSO server-initiated transactions?

The client MUST select the next available client-initiated bidirectional stream for each subsequent query on a QUIC connection, in conformance with the QUIC transport specification [RFC9000].

Just to note: RFC 9000 does not require the client to use the "next available" stream, instead saying that "a stream ID that is used out of order results in all streams of that type with lower- numbered stream IDs also being opened". So this "MUST select the next available" is a new requirement for DoQ, and it's not entirely clear to me that it's required for interop (though it is more efficient than any alternatives).

Section 5.2.1

This has implications for proxying DoQ message to and from other transports. For example, proxies may have to manage the fact that DoQ can support a larger number of outstanding queries on a single connection than e.g., DNS over TCP because DoQ is not limited by the Message ID space. This issue already exists for DoH, where a Message ID of 0 is recommended.

I'm not sure how often this motivating text is relevant. The ID field seems to be 16 bits, thus enabling 65k outstanding queries on a single connection -- how often is there a need to have that many queries outstanding at once? It looks like the motivation presented in RFC 8484 for setting the ID to zero is to improve caching, as otherwise queries identical at the DNS level would be cached as separate requests by HTTP. I agree, of course, that the ID field is redundant with the QUIC stream ID and that it should be set to zero, I am just not sure if the number of outstanding queries is a relevant motivation for doing so.

(It also looks like RFC 8484 refers to this value as the "DNS ID" rather than "Message ID". I guess our options for consistent terminology are somewhat limited, though.)

Section 5.3

The following error codes are defined for use when abruptly terminating streams, aborting reading of streams, or immediately closing connections:

Should we say that these are what QUIC calls "application error code"s? (Subsequent occurrences of the phrase "error code" might be modified to "application error code" as well.)

Section 5.3.2

set to DOQ_INTERNAL_ERROR. [...]

Is there any further guidance to give on when a DNS SERVFAIL response vs QUIC RESET_STREAM is preferred (or is the guidance really always to issue RESET_STREAM)?

Section 5.3.3

It is noted that the restrictions on use of the above EDNS(0) options has implications for proxying message from TCP/DoT/DoH over DoQ.

Was it already rejeted to spend a sentence mentioning that such proxying would involve translating the messages per the needs of the different protocols on the different connections?

Section 5.5

Servers MUST NOT execute non replayable transactions received in 0-RTT data. Servers MUST adopt one of the following behaviors:

I think we should clarify whether "execute" means "take any action in response to" or just "send a response message for". (I think it needs to be the former.)

Section 6.4

Implementations MUST protect against the traffic analysis attacks described in Section 9.5 by the judicious injection of padding. This

I think this is already overtaken by events, but a MUST-level requirement seems overbearing here. My understanding is that providing complete protection against these types of attack is still an open research question....

could be done either by padding individual DNS messages using the EDNS(0) Padding Option [RFC7830] or by padding QUIC packets (see Section 8.6 of [RFC9000], the QUIC transport specification.

There is no Section 8.6 in RFC 9000.

Section 6.5.2

Clients that want to maintain long duration DoQ connections SHOULD use the idle timeout mechanisms defined in Section 10.1 of [RFC9000], the QUIC transport specification. Clients and servers MUST NOT send the edns-tcp-keepalive EDNS(0) Option [RFC7828] in any messages sent on a DoQ connection (because it is specific to the use of TCP/TLS as a transport).

Should we make some statement (analogous to what RFC 7828 does) that if such an option is received it MUST be ignored? In the absence of such guidance I can imagine implementors feeling a need to enforce the "MUST NOT send" on the receiving end.

Section 6.7

[RFC9103] specifies zone transfer over TLS (XoT) and includes updates to [RFC1995] (IXFR), [RFC5936] (AXFR) and [RFC7766]. [...]

I note that there is currently no "Updates:" header to indicate this relationship.

DoQ implementations SHOULD
- use the same QUIC connection for both AXFR and IXFR requests to the same primary
- pipeline such requests (if they pipeline XFR requests in general) and MAY intermingle them
- send the response(s) for each request as soon as they are available i.e. responses MAY be sent intermingled

Given the "SHOULD use the same QUIC connection", what does MAY-level guidance to "intermingle such requests" mean, in a QUIC context? Each DoQ request is on a separate QUIC stream, so I do not see any opportunity for intermingling other than by virtue of being in the same QUIC connection, which is already a SHOULD. This is in contrast to a TCP or TLS situation, where there is only a single data stream and intermingling has some natural meaning (or meanings, for the response case specifically, where it might apply to overall responses (composed of multiple response messages) or individual response messages).

Section 8

The discussion in §6.5.2 about resource management could be security relevant at times, if we wanted to backreference it.

The security considerations of DoQ should be comparable to those of DoT [RFC7858]. DoT as specified in [RFC7858] only addresses the stub

The security considerations section of RFC 7858 includes a MUST-level requirement to adhere to the recommendations of BCP 195. Does such a MUST-level requirement apply to DoQ as well? (I note that BCP 195 is currently listed as only an informative reference, which would need to change if a MUST-level requirement was added.)

to recursive resolver scenario, but the considerations about person- in-the-middle attacks, middleboxes and caching of data from clear text connections also apply for DoQ to the resolver to authoritative server scenario. [...]

RFC 7858 also lists a fourth consideration, traffic analysis or side-channel leaks. Do we want to forward-reference §9.5 for completeness (or even take the secdir reviewer's suggestion of coalescing the privacy considerations into the security considerations section as confidentiality considerations)?

Section 9.1

The prevention on allowing replayable transactions in 0-RTT data expressed in Section 5.5 blocks the most obvious risks of replay

Is the parity of negations correct here ("prevention on allowing")? I see §5.5 prohibiting execution of non-replayable transactions in 0-RTT data, i.e., allowing replayable ones.

Section 10.4

Provisional reservations share the range of values larger than 0x3f with some permanent registrations. This is by design, to enable conversion of provisional registrations into permanent registrations without requiring changes in deployed systems. (This design is aligned with the principles set in Section 22 of [RFC9000].)

Do we want to specifically call out the guidance on selecting specific codepoints from §22.1.2 of RFC 9000? (Or is it seen as not applicable here?)

Section 12.1

We currently only specifically reference RFC 6891 in one place, to mention that its provision for specifying maximum UDP message size is not relevant for DoQ. However, since we do define and require (in some cases) use of a new "Too Early" EDNS(0) error code, it seems that the solution should be to reference it from more places, rather than to demote it to an informative reference.

Similarly, we only reference RFC 8914 in the IANA considerations where we allocate the codepoint, and would likely benefit from sprinkling an additional reference or two in the main body of the text.

RFC 7828, on the other hand, seems to only be mentioned to say that you MUST NOT use it, which would probably be fine as an informative reference.

RFC 7873 is referenced for "similar to the DNS Cookies mechanism", which also sounds solely informative.

[I-D.ietf-dnsop-rfc8499bis] Hoffman, P. and K. Fujiwara, "DNS Terminology", Work in Progress, Internet-Draft, draft-ietf-dnsop-rfc8499bis-03,

It's kind of surprising to see DoQ electing to take a normative dependency on this draft that is not even in WGLC yet. Wouldn't that risk incurring substantial (unbounded) delays?

Section 12.2

A SHOULD-level requirement to implement the anti-replay mechanisms from RFC 8446 seems to promote it to normative status, per https://www.ietf.org/about/groups/iesg/statements/normative-informative-references/

huitema commented 2 years ago

0RTT discuss issue is addressed in PR #158 Editorial comments in PR #154 have been approved.

huitema commented 2 years ago

Most remaining issues are addressed in PR #166. The following points are debatable:

Thanks to Phillip Hallam-Baker for the secdir review. I did want to reiterate one of his comments, regarding the potential for harmful interaction between use of DoQ (or really, any encrypted DNS transport) and captive portals. While this would accordingly have been best placed in something generic to DNS privacy mechanisms, such as RFC 9076 or RFC 8932, I think there might still be room to mention it here. I could attempt to craft some text, if there is interest.

This is not addressed the new draft. We are very reluctant to start documenting this very specific deployment issue in a transport draft. There is work in progress in the ADD WG, which will address DoH as well as DoQ. Maybe we should just wait for that.

Section 1

The specific non-goals of this document are: [...]

No attempt to support server-initiated transactions, which are used only in DNS Stateful Operations (DSO) [RFC8490].

RFC 8490 is a proposed standard, so excluding it maybe is a bit in conflict with claiming that this is a "general-purpose transport for DNS", absent some other argument that DSO is a special-purpose tool.

DSO is a special-purpose tool because it defines a new state model for a session based connection that overrides RFC7766 (the default behaviour for DNS-over-TCP)- and that new state model is what enables server initiated transactions. To our knowledge it has only been implemented for DNS Service Discovery (which drove its initial development) and is not used for any of the scenarios covered in this draft.

The client MUST select the next available client-initiated bidirectional stream for each subsequent query on a QUIC connection, in conformance with the QUIC transport specification [RFC9000].

Just to note: RFC 9000 does not require the client to use the "next available" stream, instead saying that "a stream ID that is used out of order results in all streams of that type with lower- numbered stream IDs also being opened". So this "MUST select the next available" is a new requirement for DoQ, and it's not entirely clear to me that it's required for interop (though it is more efficient than any alternatives).

Opening streams in order is definitely best practice. Not doing so interferes with mechanisms limiting the number of open streams. The new draft clarifies that the server should not enforce in order processing. Queries may arrive out of order due for example to packet losses and retransmissions.

Section 5.2.1

This has implications for proxying DoQ message to and from other transports. For example, proxies may have to manage the fact that DoQ can support a larger number of outstanding queries on a single connection than e.g., DNS over TCP because DoQ is not limited by the Message ID space. This issue already exists for DoH, where a Message ID of 0 is recommended.

I'm not sure how often this motivating text is relevant. The ID field seems to be 16 bits, thus enabling 65k outstanding queries on a single connection -- how often is there a need to have that many queries outstanding at once? It looks like the motivation presented in RFC 8484 for setting the ID to zero is to improve caching, as otherwise queries identical at the DNS level would be cached as separate requests by HTTP. I agree, of course, that the ID field is redundant with the QUIC stream ID and that it should be set to zero, I am just not sure if the number of outstanding queries is a relevant motivation for doing so.

(It also looks like RFC 8484 refers to this value as the "DNS ID" rather than "Message ID". I guess our options for consistent terminology are somewhat limited, though.)

There was a fair bit of discussion about that in reviews, and the current text is the results of these discussions. And yes, RFC 8484 also zeroes "the ID in the DNS header" -- which is how it is defined in RFC 1035.

Section 5.3.3

It is noted that the restrictions on use of the above EDNS(0) options has implications for proxying message from TCP/DoT/DoH over DoQ.

Was it already rejeted to spend a sentence mentioning that such proxying would involve translating the messages per the needs of the different protocols on the different connections?

The current text is already the result of many discussions...

Section 6.4

Implementations MUST protect against the traffic analysis attacks described in Section 9.5 by the judicious injection of padding. This

I think this is already overtaken by events, but a MUST-level requirement seems overbearing here. My understanding is that providing complete protection against these types of attack is still an open research question....

There was pretty strong consensus on "must do something", knowing full well that it is not perfect.

Section 6.5.2

Clients that want to maintain long duration DoQ connections SHOULD use the idle timeout mechanisms defined in Section 10.1 of [RFC9000], the QUIC transport specification. Clients and servers MUST NOT send the edns-tcp-keepalive EDNS(0) Option [RFC7828] in any messages sent on a DoQ connection (because it is specific to the use of TCP/TLS as a transport).

Should we make some statement (analogous to what RFC 7828 does) that if such an option is received it MUST be ignored? In the absence of such guidance I can imagine implementors feeling a need to enforce the "MUST NOT send" on the receiving end.

It is already specified as an error condition in {{Protocol-Errors}}, so yes, implementers are absolutely going to enforce "MUST NOT send." No ambiguity there.

Section 6.7

[RFC9103] specifies zone transfer over TLS (XoT) and includes updates to [RFC1995] (IXFR), [RFC5936] (AXFR) and [RFC7766]. [...]

I note that there is currently no "Updates:" header to indicate this relationship.

It seems it does. Looking at https://www.ietf.org/rfc/rfc9103.txt, the header includes an update line.

The discussion in §6.5.2 about resource management could be security relevant at times, if we wanted to backreference it.

The security considerations of DoQ should be comparable to those of DoT [RFC7858]. DoT as specified in [RFC7858] only addresses the stub

The QUIC security consideration include discussion of Slowloris Attacks (section 21.6). Isn't that sufficient?

RFC 7858 also lists a fourth consideration, traffic analysis or side-channel leaks. Do we want to forward-reference §9.5 for completeness (or even take the secdir reviewer's suggestion of coalescing the privacy considerations into the security considerations section as confidentiality considerations)?

Maybe not. The draft does have a section about traffic analysis and mitigations, which cover DoQ specific issues. Side-channel discussions could easily diverge into a rat-hole, with little actionable results. Then we would have to distinguish between voluntary side channel, such as emitting a series of queries with very specific timing , and involuntary side channel, in which a third party tweaks the messages to carry some signal. The former is not really actionable, and the latter is mostly a problem for QUIC itself rater than DoQ.

Do we want to specifically call out the guidance on selecting specific codepoints from §22.1.2 of RFC 9000? (Or is it seen as not applicable here?)

Not really applicable. 22.1.2 is concerned with the extra overhead caused by long numbers. This is mostly an issue for frequently used code points, like frame types, which could be used on every packet. We only have code points for error conditions, and it doesn't matter very much whether those code points encode in 1, 2, 4 or even 8 bytes.

kaduk commented 2 years ago

Thanks for all the commentary here, I appreciate all the responses even if I will select just a few to specifically reply to.

The specific non-goals of this document are: [...]

No attempt to support server-initiated transactions, which are used only in DNS Stateful Operations (DSO) [RFC8490]. RFC 8490 is a proposed standard, so excluding it maybe is a bit in conflict with claiming that this is a "general-purpose transport for DNS", absent some other argument that DSO is a special-purpose tool.

DSO is a special-purpose tool because it defines a new state model for a session based connection that overrides RFC7766 (the default behaviour for DNS-over-TCP)- and that new state model is what enables server initiated transactions. To our knowledge it has only been implemented for DNS Service Discovery (which drove its initial development) and is not used for any of the scenarios covered in this draft.

That convinces me. Thanks.

This has implications for proxying DoQ message to and from other transports. For example, proxies may have to manage the fact that DoQ can support a larger number of outstanding queries on a single connection than e.g., DNS over TCP because DoQ is not limited by the Message ID space. This issue already exists for DoH, where a Message ID of 0 is recommended.

I'm not sure how often this motivating text is relevant. The ID field seems to be 16 bits, thus enabling 65k outstanding queries on a single connection -- how often is there a need to have that many queries outstanding at once? It looks like the motivation presented in RFC 8484 for setting the ID to zero is to improve caching, as otherwise queries identical at the DNS level would be cached as separate requests by HTTP. I agree, of course, that the ID field is redundant with the QUIC stream ID and that it should be set to zero, I am just not sure if the number of outstanding queries is a relevant motivation for doing so.

There was a bit of discussion later on about the terminology used to describe the ID field, and the requirement to zero it, as having gotten extensive WG discussion. I just want to highlight here that my primary comment relates to the text about "limited by the Message ID space". I think the actual behaviors specified are good, I'm just not sure whether the number of queries outstanding on a connection is ever actually limited by the Message ID space in practice. (But I also am not going to insist on any change; I comment here only to ensure that my intent was understood.)

[RFC9103] specifies zone transfer over TLS (XoT) and includes updates to [RFC1995] (IXFR), [RFC5936] (AXFR) and [RFC7766]. [...] I note that there is currently no "Updates:" header to indicate this relationship.

It seems it does. Looking at https://www.ietf.org/rfc/rfc9103.txt, the header includes an update line.

Oops, that's my mistake. I was reading too fast and misread the quoted bit as saying that dnsoquic was including updates to the listed RFCs. dnsoquic has no Updates: headers for those RFCs (which is correct, because RFC 9103 should and does have them instead).

The security considerations of DoQ should be comparable to those of DoT [RFC7858]. DoT as specified in [RFC7858] only addresses the stub

The QUIC security consideration include discussion of Slowloris Attacks (section 21.6). Isn't that sufficient?

I think something got jumbled here and I'm not sure what the intent of the reply was. It looks like my question was whether we intended the BCP 195 guidance to apply to DoQ, and if so, whether that should be mentioned specifically.

huitema commented 2 years ago

On the 65K limit: I have seen traces of DNS root servers in which some individual IP addresses were sending tens of thousands of messages per second. I have also see traces in which big clusters of servers with addresses in the same /24 or /48. It is not hard to think of such resolvers eventually hitting a 65K limit. Granted that's a bit of a stretch today, but it could happen. Those are also the resolvers most worried about the Kaminsky attack. We see a variety of tactics used to blunt such attacks, but QUIC would be a very good fit. Eventually, once the tech is proven.

The intro of BCP 195 says "It is expected that the TLS 1.3 specification will resolve many of the vulnerabilities listed in this document." QUIC embeds TLS 1.3 and has no mechanism to negotiate down to TLS 1.2 or others. I don't think that an explicit reference to BCP 195 is necessary.

But yes, things got jumbled. The Slowloris mention was a response to something else:

The discussion in §6.5.2 about resource management could be security relevant at times, if we wanted to backreference it.

I looked whether I could work out a reference to 6.5.2 in the security section, but it did not really seem to fit. Plus, the relevant attacks really are variants of Slowloris and similar denial of service attacks, which are already addressed in the QUIC security review.

kaduk commented 2 years ago

Okay, thanks. I think we can consider these all resolved, then.

huitema / dnsoquic

Benjamin Kaduk's ballot comments #156

DISCUSS:

COMMENT: