Closed bikallem closed 1 year ago
linux resolv.conf supports IPv6 Scoped Address Architecture (https://datatracker.ietf.org/doc/html/rfc4007#section-11) which is where the %
character comes from.
I have updated the resolve_conf parser to not error out when encountering zone index tokens. Instead we just discard them as glibc resolv.conf implemenetation does. (https://github.com/bminor/glibc/blob/master/resolv/res_init.c#L422)
/cc @hannesm
This PR is ready for review.
Hello,
thanks for working on this PR. I'm curious about (a) why to reject / discard that %yyy
in the lexer? and (b) the handling of newlines is vastly different from the other lexer (in the zone
subdir) - why? can we - at least in this repository - be consistent about the lexer usage (I don't know which way is "correct", since the other lexer was taken from the earlier DNS implementation).
Semantically, I wonder what the desired "scoped IPv6" nameserver entry should be? Should that nameserver only ever be asked via the provided interface name (in your example)? I'm rather confused why your resolv.conf contains such a scope, and wonder whether dropping it could lead to information leakage (since ocaml-dns may use a different interface than the given one to conduct DNS requests).
I'm curious about (a) why to reject / discard that %yyy in the lexer?
It is part of IPv6 standard rather than resolv.conf, i.e fe80::1ff:fe23:4567:890a%eth2
is a valid IPv6 address. For now we discard the IPv6 scoped literal since Ipaddr
can't parse it. There is existing issue for it in https://github.com/mirage/ocaml-ipaddr/issues/76. Regardless, from my initial understanding, the use case for IPv6 scoped literal is that it is used mostly in the OS/kernel/IP protocol level rather than the dns particulars(this repo).
Secondly, the resolv.conf implementation as it exists in glibc currently also discards this information when parsing /etc/resolv.conf
as linked to in my previous comment. So there is a prior art sort of reason for discarding this data, i.e. since glibc does it and we do it, we are probably okay.
Finally, the current dns-client information functions okay without this data.
why? can we - at least in this repository - be consistent about the lexer usage (I don't know which way is "correct", since the other lexer was taken from the earlier DNS implementation).
I am not sure if there is an accepted standard way to handle newlines in ocamllex. Most likely a particular use case demands we handle newlines in a particular way. In the zone
lexer, a data structure parserstate
maintains the lineinfo (https://github.com/mirage/ocaml-dns/blob/b1ab0f5555e31c61b24d71c4b09ea5a7782f45a5/zone/dns_zone_state.ml#L24). As such it handles it accordingly. However, for the resolvconf use case, we don't define such data structure and use the one available from Lexing.lexbuf
- which is adequate for the use case.
A bit outside of the scope of this PR, but I think the resolvconf parser is bit superfluous in that we lex/parse token that we are not interested in, i.e. we only seem to need the nameserver ...
bit from resolv.conf but we seem to generate tokens for a lot more. These can be safely removed. Wdyt? Is there an appetite for this?
Secondly, the resolv.conf implementation as it exists in glibc currently also discards this information
That's not leading anywhere. Yes indeed there is lots of software which behaves strangely/weirdly. I'm not keen on adapting the behaviour based on random libc implementations (yes indeed there are others than glibc, e.g. FreeBSD libc and OpenBSD libc, but also musl libc, ...). If there is a speciication how to deal with "scoped addresses", we should follow that specification. I haven't had time to read the RFC 4007 and intended behaviour.
Was there a reason for your Linux distribution to embed the scoped identifier into /etc/resolv.conf? If yes, why? If no, why did they put it there?
Finally, the current dns-client information functions okay without this data.
As I tried to outline, this is fine for academic software, but for potentially security-relevant (leaking hostnames on an unintended network interface) pieces of software, I'd like to have a better argument than "it works fine".
which is adequate for the use case
Maybe. But maybe from a maintenance point of view, this is a burden if these lexers and parsers are behaving differently. Of course you may argue that "it works fine", but tbh I'd prefer some streamline in this repository where also in 5 years time I can easily wrap my head around what is happening.
but we seem to generate tokens for a lot more. These can be safely removed. Wdyt? Is there an appetite for this?
Maybe have a read through the old issues, such as https://github.com/mirage/ocaml-dns/issues/243 -- there's some interest in handling more features. What would be the benefit from removing it? Also, the comment in the lexer, "( inspired by https://github.com/tailhook/resolv-conf/blob/master/src/grammar.rs )", may be worth a look -- since there is no POSIX (or RFC) specification for /etc/resolv.conf, the easiest for me is to keep in sync with a grammar other languages have developed.
This also leads to the question: how does the rust implementation handle scoped addresses?
Coming back to this issue, I'd appreciate to fix this in ocaml-ipaddr instead. There's some active discussion about the presentation of an Ipaddr.V6.t https://github.com/mirage/ocaml-ipaddr/issues/113#issuecomment-1429785250
I did some more research on this topic recently.:
References:
Thanks for your investigations. Would you mind to take a look at #334 and let me know what you think of that PR? The main difference is that the zone_id is kept for the parser and discarded there. The lexer is as well improved to know about line numbers.
Thanks a lot, merged #334 which fixes the issue reported here.
This PR improves resolvconf parser error message so that it is helpful when trying to fix resolv.conf related syntax errors. The improved error message is provided in the test. Without this PR, the error message is
lexing: empty token
which isn't very helpful.I experienced this issue while attempting to use my local
resolv.conf
(NixOs 22.05) which content is as follows:Note the
%
char in line 4 which is not recognized by the lexer/parser. I am still investigating if my OS generated resolv.conf is valid or not. Regardless I think this PR helps in resolving similar issue that I faced.