grpc / grpc-java

The Java gRPC implementation. HTTP/2 based RPC
https://grpc.io/docs/languages/java/
Apache License 2.0
11.28k stars 3.79k forks source link

Support of `dns:name` URIs #10824

Open FyiurAmron opened 6 months ago

FyiurAmron commented 6 months ago

As it currently stands, the URI validation and parsing for dns schema in grpc-java is in contradiction of both RFC-4501 and (not relevant really) core gRPC specs by rejecting dns:name-type URIs (no slash). This generates discrepancies between different gRPC implementations and is generally problematic. Currently we have 6 (yes, literally six) competing standards as to what URIs are allowed, none of which are actually aligned, and all of them are "official" in some way. The only thing that's missing for grpc-java to be a strict superset here (and thus reduce the incompatibilities and confusion) is to have dns:name allowed as per RFC-4501 spec (ditto).

A fully backwards-compatible solution would be to change https://github.com/grpc/grpc-java/blob/master/core/src/main/java/io/grpc/internal/DnsNameResolverProvider.java#L56 to allow any reasonable/valid RFC-4501-compliant (ditto) URI to be accepted. Code-wise, it's trivial, has no parsing ambiguities and has no negative impact on the existing code solutions.

Describe the solution you'd like

dns:name would be included as valid , RFC-4501-type URIs would be better (ditto) and supported, confusion and discrepancies would be reduced.

Additional context

https://github.com/grpc/grpc/issues/35539 & https://github.com/grpc-ecosystem/grpc-spring/issues/1024 If a PR is needed, I'll gladly do it.

larry-safran commented 5 months ago

Scheduled for discussion in the maintainer API review meeting.

ejona86 commented 5 months ago

Code-wise, it's trivial, has no parsing ambiguities and has no negative impact on the existing code solutions.

Not for a RFC 2396 URI parser, which is what java.net.URI is. RFC 3986 redefined URIs in such a way that it was essentially impossible for many pre-existing systems to adopt it. Even with Go's URI parser, which claims to be based on RFC 3986, has the same trouble here as java.net.URI. Both use the RFC 2396 semantics that a scheme is either hierarchical (foo://bar or foo:/bar) or not (foo:bar), and non-hierarchical schemes are essentially unparsed by the URI. It essentially requires using a completely different URI parser.

grpc-java is in contradiction of both RFC-4501

No gRPC implementation supports RFC 4501. RFC 4501:

  1. can only load a single resource, but gRPC loads A, AAAA, and sometimes SRV and TXT
  2. considers host. and host equivalent, never searching domains. gRPC definitely searches for hosts not ending in .
  3. can't handle host:port; it would look up "host:port" in the DNS system, which generally won't exist
  4. can't handle IP addresses; it would look up the IP in the DNS system as if it were a host name, which generally won't exist

The only thing you can do by mentioning RFC 4501 is argue that gRPC should have used a different scheme. But that ship has sailed, and mostly nobody cares about RFC 4501.

FyiurAmron commented 5 months ago

Not for a RFC 2396 URI parser, which is what java.net.URI is.

Well, yes. That only proves that java.net.URI is not a future-proof parser that should be depended on in the long run in this case :)

It essentially requires using a completely different URI parser.

Well, yes, but switching the URI parser would basically be the only requirement for the change here (AFAIK), and is, arguably, trivial by itself, since whether java.net.URI is used for that parsing or not is just an implementation detail for grpc-java (AFAIK).

Deciding on the new parser to use (or writing it, worst case scenario) is another thing, and I agree that can (or not, depending on the detail) a non-trivial issue.

The only thing you can do by mentioning RFC 4501 is argue that gRPC should have used a different scheme. But that ship has sailed, and mostly nobody cares about RFC 4501.

True enough. I amended this request accordingly. This doesn't change the core argument here - being more compliant to RFC-4501-like dns:name URL/URIs (however we call them) is still a bonus.

larry-safran commented 5 months ago

@FyiurAmron are you essentially asking support non-hierarchical URIs and all of the discussion about standards is justification for it, but not really relevant to your needs?

If so, why do you need the non-hierarchical URI?

FyiurAmron commented 5 months ago

@larry-safran

all of the discussion about standards is justification for it, but not really relevant to your needs?

Yes, that's correct.

If so, why do you need the non-hierarchical URI?

honestly? I don't need them in the strict sense, because they can be replaced by dns:///name in grpc-java (which is my actual use case). I do however both agree with @markdroth that this is something that POLA would suggest having and is good UX-wise, and I honestly believe that with the mishmash of URL/URIs formats supported or documented by every platform/source, having a reasonable common canon that would be present and accepted everywhere where possible would be prudent. I also think that aligning current grpc-java behaviour with grpc-C and grpc core docs would benefit (or at least not hurt) everyone in the long run, and would make maintaining integrations simpler by not having to document or work around the caveats and discrepancies in such detail (again, by the virtue of having a reasonable middle ground).

As an aside: I went through this because I tried to make sense of the docs in the libs I use (mostly grpc-spring -> grpc-java ATM), and it actually required me to talk to people on three different repos to learn stuff that is basically still tribal knowledge as the result of mostly arbitrary decisions. While I'm in no position to question those decisions in the historic sense, since they obviously had a solid background that made them be, after talking with my co-workers and people on the 'net I realized I'm not alone in being lost here with what actually is "valid" where and why, and that the outcome of current situation is just by itself confusing to people (including me). Like I wrote in the first post here, I'm willing to put the required effort myself, because I think that this is a solvable engineering problem that doesn't need to be anymore, especially in 2024.

ejona86 commented 5 months ago

Most of the docs seem consistent and use dns:///host. But yes, C core's docs use dns:host. If that's not the case, you can link to the other docs and I'll look into them.

The basic premise is C added support for something that isn't easily provided by other languages. Using dns:///foo works in all languages without any surprises. dns:/foo is probably fine in all languages, simply because of URI semantics. I do not recommend using dns:foo, because support is varied. Even in languages that claim to support it, like grpc-go, you may find the implementation is broken (e.g., it is broken for IPv6, except you can maybe make it work if you use an invalid URI, but that isn't then likely cross-language).

Regarding tribal knowledge, I thought Java's docs were reasonable.

URI is part of our NR API. I am doubtful that using another library will be a good road, but it is mostly a theoretical discussion as we aren't discussing concrete implementations. There may be some hacks where you use URI, but we need to make sure to handle query strings, fragments, and percent encoding the same between dns:/// and bare dns:.

Also, this issue is not limited to just dns. xds is another case.

FyiurAmron commented 5 months ago

@ejona86

Most of the docs seem consistent and use dns:///host. But yes, C core's docs use dns:host. If that's not the case, you can link to the other docs and I'll look into them.

gRPC PHP: https://grpc.github.io/grpc/php/md_doc_naming.html

Most gRPC implementations support the following URI schemes: dns:[//authority/]host[:port] – DNS (default)

This is copy-pasted from Core, and I guess it makes sense since PHP implementation is just a wrapper around a binary lib from C++ code, but that's my biggest gripe here in a way:

gRPC Core says: https://grpc.github.io/grpc/core/md_doc_naming.html

Most gRPC implementations support the following URI schemes: dns:[//authority/]host[:port] – DNS (default)

Most. TBH, if we could agree that's actually not the case, i.e. that most gRPC implementations actually DON'T support dns:[//authority/]host[:port] but only dns:[//authority]/host[:port] - at least maybe we could try to push for Core and PHP to fix what's written there and change the exact wording to either cover just the C++ implementation (i.e. "C++ gRPC implementation supports the following URI schemes:" etc)?

Maybe those docs should be considered obsolete and removed? I mean, they seem autogenerated but still updated; yet http://grpc.github.io/ and https://github.com/grpc/grpc.github.io/ do carry the implication they shouldn't exist anymore?

The basic premise is C added support for something that isn't easily provided by other languages. Using dns:///foo works in all languages without any surprises. dns:/foo is probably fine in all languages, simply because of URI semantics. I do not recommend using dns:foo, because support is varied. Even in languages that claim to support it, like grpc-go, you may find the implementation is broken (e.g., it is broken for IPv6, except you can maybe make it work if you use an invalid URI, but that isn't then likely cross-language).

I understand the caveats. However, see above. I'll reiterate: the reasons you provide for not having dns:host are valid enough by themselves. However, what you say only proves that the current core (and PHP, which clones it) docs are wrong. Like I said in the previous ticket ( https://github.com/grpc/grpc/issues/35539 ) - even if we could agree that it would be nice to support dns:host for whatever reason (and yeah, I still think it would be nice by itself), it doesn't seem to be supported widely at the moment, if at all. C# docs speak only about dns:///. Go docs as well. Docs for other languages are lacking, to say the least.

Regarding tribal knowledge, I thought Java's docs were reasonable.

Honestly? That's basically tribal level if we're talking about a lib such widespread as gRPC. A Javadoc for a very specific method for a very specific class located in a very specific place. Info provided such as this, when contradicting other official source in any way, forces you to seek "those who know what and why", and that's tribal knowledge, by definition. I'm not blaming you nor anyone. I'm just stating the fact that conflicting information and opinions create a localized (hence "tribal") sources of truth.

URI is part of our NR API.

A very good point.

I am doubtful that using another library will be a good road, but it is mostly a theoretical discussion as we aren't discussing concrete implementations. There may be some hacks where you use URI, but we need to make sure to handle query strings, fragments, and percent encoding the same between dns:/// and bare dns:.

Also, this issue is not limited to just dns. xds is another case.

Another very good point.

However, I still stand by what I said - the fix for Java is trivial. Since dns:name etc. is basically valid URI but it doesn't parse into proper parts, the example solution is to, in e.g. newNameResolver, between lines 54 and 55:

That's literally 3 lines of code, with no real performance downside, and with a straightforward logic - both easy to maintain and easy to understand. And yes, it's a workaround for the parser, and should be documented as such in code. Yet it works, and I see no actual possible problems or corner cases with it. It would also work for any other similar case, be it xds or whatnot. It's not a perfect solution, but it's an actual engineering solution, a manual extension to the parsing logic. It boils down the problem down to "should we do it?" instead of "can we do it?" or "how we do it?". And yes, I'm not saying "this is a solution that should be used". I'm just saying that ATM it's completely doable, with near-zero effort (like I said, I can do the PR etc. myself, this is nowhere near the gargantuan code blocks I have and had to deal with, rewrite and fix in my daily job).

I mean, I'm more than willing to concede and say "yeah, well, we have dns:/// and it's working, maybe it's ugly but ubiquitous and works everywhere". However, that would essentially imply that what I did put in the original ticket was right - that the template string provided in the core/PHP docs is wrong, because it works only for that particular implementation, and literally nowhere else.

Like I said, I'm just seeking closure w.r.t. this. Believe me or not, my first ticket was to clarify and fix the docs, not turn the world upside down. I still stand by my initial action and would appreciate any kind of support.