openzipkin / brave

Java distributed tracing implementation compatible with Zipkin backend services.
Apache License 2.0
2.36k stars 714 forks source link

Add HttpRequest.host() #950

Open codefromthecrypt opened 5 years ago

codefromthecrypt commented 5 years ago

This change follows from https://github.com/openzipkin/zipkin/issues/2167

The http host/authority is not always located in headers, at least not at all times. For example, some clients have static configuration regarding the host they will send to, and that is available prior to adding a header.

Especially if we add outbound-specific configuration (ex #949), we may want to expose the ability to generically get the host/authority of a request independently from the url and without header heuristics.

cc @kojilin @trustin

codefromthecrypt commented 4 years ago

Here's a pending spike

  /**
   * The "host" or "host:port" sent by the client. By default, this returns the <a
   * href="https://tools.ietf.org/html/rfc7230#section-5.4">host</a> {@link #header(String)}, but it
   * could also be the SNI server name presented during the SSL handshake by the client.
   *
   * <p>This is often used as a {@link Span#remoteServiceName(String) remote service name} on
   * {@link HttpClientRequest client spans}. It can also be used as a {@linkplain HttpTags#AUTHORITY
   * tag} or used in {@linkplain HttpRuleSampler sampling}.
   *
   * <p><em>Note</em>: This is an invalid remote service name for server spans, as it doesn't
   * identify the client.
   *
   * @see HttpRequest#path()
   * @see HttpTags#AUTHORITY
   * @since 5.13
   */
  @Nullable public abstract String authority();
codefromthecrypt commented 4 years ago

I think this needs to be done before Brave 6, so that we can make a syntax for the Http rule sampler

trustin commented 4 years ago

The draft Javadoc sounds reasonable to me. One thing that sounds somewhat off is we use it also as an SNI server name, which may be worth its own tag for non-HTTP.

codefromthecrypt commented 4 years ago

One thing that sounds somewhat off is we use it also as an SNI server name, which may be worth its own tag for non-HTTP.

I think the problem was that depending on the library in use, it may have been derived from the SNI server name and we wouldn't know from the HTTP abstraction if that were the case or not. The note here was not suggesting to use it as the same as SNI server name, just warning that libraries could be muddy on this point.

trustin commented 4 years ago

I see. That's fine, then.

codefromthecrypt commented 4 years ago

I was thinking about this should be host and without the port. We already have the URL and also remoteEndpoint which has the port. We can already get the host header.

The most important part is sampling policy which involves the end-user parsing something at the moment, namely to strip the port.

In other words, I recommend we rename it back to "host" say this is not the same as the host header, as it will not include the port. Similar to okhttp's semantics, this is the url's host. https://github.com/square/okhttp/blob/master/okhttp/src/main/kotlin/okhttp3/HttpUrl.kt#L334

This implies that the implementations parse out a trailing :port when there is one, taking care to not nuke IPv6

As this is likely to be needed in RPC also, and there's no guarantee of URL use there (though it is the case in grpc and dubbo), I'd recommend we state on only on RPC, that this is the socket's or TLS SNI host when using the SNI extension. The same reason applies.. from an operator, we wouldn't expect someone to willingly choose to get no value for host, just because the middleware happens to not use URL identification. By falling back, we allow simple sampling rules like my.saas=100 tracesPerSecond

wdyt?

codefromthecrypt commented 4 years ago

fwiw the same thing mentioned about rpc applies in general for remote spans. ex kafka has a remote hostname of the broker, so does basically everything :P

trustin commented 4 years ago

That's fine, as long as it's documented clearly. 😄