resolvconf: correctly parse IPv6 Scoped Address in nameserver

bikallem commented 1 year ago

This PR improves resolvconf parser error message so that it is helpful when trying to fix resolv.conf related syntax errors. The improved error message is provided in the test. Without this PR, the error message is lexing: empty token which isn't very helpful.

I experienced this issue while attempting to use my local resolv.conf(NixOs 22.05) which content is as follows:

# Generated by resolvconf
search home
nameserver 192.168.1.254
nameserver fe80::c2d7:aaff:fe96:8d82%wlp3s0
options edns0

Note the % char in line 4 which is not recognized by the lexer/parser. I am still investigating if my OS generated resolv.conf is valid or not. Regardless I think this PR helps in resolving similar issue that I faced.

bikallem commented 1 year ago

linux resolv.conf supports IPv6 Scoped Address Architecture (https://datatracker.ietf.org/doc/html/rfc4007#section-11) which is where the % character comes from.

I have updated the resolve_conf parser to not error out when encountering zone index tokens. Instead we just discard them as glibc resolv.conf implemenetation does. (https://github.com/bminor/glibc/blob/master/resolv/res_init.c#L422)

/cc @hannesm

This PR is ready for review.

hannesm commented 1 year ago

Hello,

thanks for working on this PR. I'm curious about (a) why to reject / discard that %yyy in the lexer? and (b) the handling of newlines is vastly different from the other lexer (in the zone subdir) - why? can we - at least in this repository - be consistent about the lexer usage (I don't know which way is "correct", since the other lexer was taken from the earlier DNS implementation).

Semantically, I wonder what the desired "scoped IPv6" nameserver entry should be? Should that nameserver only ever be asked via the provided interface name (in your example)? I'm rather confused why your resolv.conf contains such a scope, and wonder whether dropping it could lead to information leakage (since ocaml-dns may use a different interface than the given one to conduct DNS requests).

bikallem commented 1 year ago

I'm curious about (a) why to reject / discard that %yyy in the lexer?

It is part of IPv6 standard rather than resolv.conf, i.e fe80::1ff:fe23:4567:890a%eth2 is a valid IPv6 address. For now we discard the IPv6 scoped literal since Ipaddr can't parse it. There is existing issue for it in https://github.com/mirage/ocaml-ipaddr/issues/76. Regardless, from my initial understanding, the use case for IPv6 scoped literal is that it is used mostly in the OS/kernel/IP protocol level rather than the dns particulars(this repo).

Secondly, the resolv.conf implementation as it exists in glibc currently also discards this information when parsing /etc/resolv.conf as linked to in my previous comment. So there is a prior art sort of reason for discarding this data, i.e. since glibc does it and we do it, we are probably okay.

Finally, the current dns-client information functions okay without this data.

why? can we - at least in this repository - be consistent about the lexer usage (I don't know which way is "correct", since the other lexer was taken from the earlier DNS implementation).

I am not sure if there is an accepted standard way to handle newlines in ocamllex. Most likely a particular use case demands we handle newlines in a particular way. In the zone lexer, a data structure parserstate maintains the lineinfo (https://github.com/mirage/ocaml-dns/blob/b1ab0f5555e31c61b24d71c4b09ea5a7782f45a5/zone/dns_zone_state.ml#L24). As such it handles it accordingly. However, for the resolvconf use case, we don't define such data structure and use the one available from Lexing.lexbuf - which is adequate for the use case.

bikallem commented 1 year ago

A bit outside of the scope of this PR, but I think the resolvconf parser is bit superfluous in that we lex/parse token that we are not interested in, i.e. we only seem to need the nameserver ... bit from resolv.conf but we seem to generate tokens for a lot more. These can be safely removed. Wdyt? Is there an appetite for this?

hannesm commented 1 year ago

Secondly, the resolv.conf implementation as it exists in glibc currently also discards this information

That's not leading anywhere. Yes indeed there is lots of software which behaves strangely/weirdly. I'm not keen on adapting the behaviour based on random libc implementations (yes indeed there are others than glibc, e.g. FreeBSD libc and OpenBSD libc, but also musl libc, ...). If there is a speciication how to deal with "scoped addresses", we should follow that specification. I haven't had time to read the RFC 4007 and intended behaviour.

Was there a reason for your Linux distribution to embed the scoped identifier into /etc/resolv.conf? If yes, why? If no, why did they put it there?

Finally, the current dns-client information functions okay without this data.

As I tried to outline, this is fine for academic software, but for potentially security-relevant (leaking hostnames on an unintended network interface) pieces of software, I'd like to have a better argument than "it works fine".

which is adequate for the use case

Maybe. But maybe from a maintenance point of view, this is a burden if these lexers and parsers are behaving differently. Of course you may argue that "it works fine", but tbh I'd prefer some streamline in this repository where also in 5 years time I can easily wrap my head around what is happening.

but we seem to generate tokens for a lot more. These can be safely removed. Wdyt? Is there an appetite for this?

Maybe have a read through the old issues, such as https://github.com/mirage/ocaml-dns/issues/243 -- there's some interest in handling more features. What would be the benefit from removing it? Also, the comment in the lexer, "( inspired by https://github.com/tailhook/resolv-conf/blob/master/src/grammar.rs )", may be worth a look -- since there is no POSIX (or RFC) specification for /etc/resolv.conf, the easiest for me is to keep in sync with a grammar other languages have developed.

This also leads to the question: how does the rust implementation handle scoped addresses?

hannesm commented 1 year ago

Coming back to this issue, I'd appreciate to fix this in ocaml-ipaddr instead. There's some active discussion about the presentation of an Ipaddr.V6.t https://github.com/mirage/ocaml-ipaddr/issues/113#issuecomment-1429785250

bikallem commented 1 year ago

I did some more research on this topic recently.:

I am unable to conclude that zone index information should be part of an IPv6 address since none of the IPv6 RFCS that I managed to pore through specifies how to store the zone index information in the IPv6 address itself.
The ipv6 parsing libraries for rust, golang and libc all don't accept zone index information, i.e. they error out when attempting to do so.
Zone index are only applicable to link local addresses (fe80), therefore zone index information is not usable outside of the link.
Other projects that has to parse resolv.conf also ignores/strips away the zone index information attached to a "nameserver". In addition to glibc example I cited above, it docker also strips away the zone index information.

References:

hannesm commented 1 year ago

Thanks for your investigations. Would you mind to take a look at #334 and let me know what you think of that PR? The main difference is that the zone_id is kept for the parser and discarded there. The lexer is as well improved to know about line numbers.

hannesm commented 1 year ago

Thanks a lot, merged #334 which fixes the issue reported here.

mirage / ocaml-dns

resolvconf: correctly parse IPv6 Scoped Address in nameserver #328