jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.52k stars 2.44k forks source link

[jaeger-v2] Align Cassandra Storage Config With OTEL #5928

Closed mahadzaryab1 closed 1 month ago

mahadzaryab1 commented 2 months ago

Requirement

In this issue, we want to align the configuration for cassandra storage with OTEL (part of https://github.com/jaegertracing/jaeger/issues/5229)

Tasks / outcomes

mahadzaryab1 commented 2 months ago

@yurishkuro I had a question regarding the reuse of standard OTEL configs. For https://github.com/mahadzaryab1/jaeger/blob/main/pkg/cassandra/config/config.go#L37, should the type of this field be configtls.ClientConfig from go.opentelemetry.io/collector/config/configtls?

yurishkuro commented 2 months ago

Yes

mahadzaryab1 commented 2 months ago

hey @yurishkuro! i had a couple of questions about the TLS configs:

  1. I don't see an equivalent for https://github.com/jaegertracing/jaeger/blob/main/pkg/config/tlscfg/options.go#L160-L165 in OTEL's configtls. Can we simply just remove these calls to Stop now?
  2. How should we handle methods like https://github.com/jaegertracing/jaeger/blob/main/pkg/config/tlscfg/flags.go#L66? Do we want to make new methods that init from viper into the OTEL TLS type? Should this still live in the tlscfg package?
yurishkuro commented 2 months ago
  1. The reason they don't have it is because OTEL uses different ways of watching for file changes. Jaeger actually has a watcher (integrated with events from the file system) that notifies Jaeger immediately when the file is changed, but that comes at the cost of running a separate goroutine for receiving the events, which is why we need the Close function to clean it up. OTEL uses a simpler approach - they have a fixed timeout and every time they read TLS certs (e.g. when a new connection is established) they check if the timeout has expired, and if so they explicitly check if the file was changed and if so they reload the certs. Their approach does not require a goroutine, but it does introduce an additional one-off latency in the request processing.
  2. Our tlscfg struct is mostly compatible with OTEL's TLS, so there are different ways we can handle the translation. The simplest is we just copy tlscfg into otel TLS as needed.
mahadzaryab1 commented 2 months ago

@yurishkuro whats the reason that we have tags withmapstructure:"-" (ex. https://github.com/jaegertracing/jaeger/blob/main/pkg/cassandra/config/config.go#L36C2-L36C22?). Are these fields just temporarily disabled to be accepted via the config files?

mahadzaryab1 commented 2 months ago

@yurishkuro here's my proposal for the groupings of the cassandra v2 configs. Do you have any feedback? Let me know if you prefer to review it on the PR instead.

type Configuration struct {
    Connection    Connection    `mapstructure:"connection"`
    Data          Data          `mapstructure:"data"`
    Timeout       Timeout       `mapstructure:"timeout"`
    ProtoVersion  int           `mapstructure:"proto_version"`
    Authenticator Authenticator `mapstructure:"auth"`
}

type Connection struct {
    Servers              []string               `valid:"required,url" mapstructure:"servers"`
    LocalDC              string                 `mapstructure:"local_dc"`
    Port                 int                    `mapstructure:"port"`
    DisableAutoDiscovery bool                   `mapstructure:"-"`
    ConnectionsPerHost   int                    `mapstructure:"connections_per_host"`
    ReconnectInterval    time.Duration          `mapstructure:"reconnect_interval"`
    SocketKeepAlive      time.Duration          `mapstructure:"socket_keep_alive"`
    MaxRetryAttempts     int                    `mapstructure:"max_retry_attempts"`
    TLS                  configtls.ClientConfig `mapstructure:"tls"`
}

type Data struct {
    Keyspace           string `mapstructure:"keyspace"`
    DisableCompression bool   `mapstructure:"disable_compression"`
    Consistency        string `mapstructure:"consistency"`
}

type Timeout struct {
    Query   time.Duration `mapstructure:"-"`
    Connect time.Duration `mapstructure:"connection_timeout"`
}

type Authenticator struct {
    Basic BasicAuthenticator `yaml:"basic" mapstructure:",squash"`
}

type BasicAuthenticator struct {
    Username              string   `yaml:"username" mapstructure:"username"`
    Password              string   `yaml:"password" mapstructure:"password" json:"-"`
    AllowedAuthenticators []string `yaml:"allowed_authenticators" mapstructure:"allowed_authenticators"`
}
yurishkuro commented 2 months ago

whats the reason that we have tags with mapstructure:"-"

No reason / laziness. v1 did not make any use of mapstructure tags, but v2 does, so we need to define them. I don't see why DisableAutoDiscovery would not be needed.

here's my proposal for the groupings of the cassandra v2 config

Overall looks good to me, but a few thoughts:

mahadzaryab1 commented 2 months ago

What specific benefits do you see in separating connection and data?

I mostly just did for readability. Should I move the data groupings under connection?

Timeout and ProtoVersion at the top level are a bit odd, won't they conceptually fit under Connection? Also, timeout needs to be renamed to a more concrete name - is it connection timeout, request timeout?

I can move Timeout and ProtoVersion under connection. Timeout is a higher level grouping and it has two fields underneath; Query and Connect. Does that make sense? Let me know if you want any improvements here.

Would you mind collaborating with @akstron on https://github.com/jaegertracing/jaeger/pull/5922 where new config options are being proposed (only used during schema creation), which might affect how you are thinking about logical separation of fields.

Sure!

yurishkuro commented 2 months ago

I would flatten timeouts into Connection

yurishkuro commented 2 months ago

Why wouldn't auth go under connection?

Consistency seems a connection concept, not schema's

mahadzaryab1 commented 2 months ago

Thanks for your feedback! @yurishkuro. I'll address these and push up my changes.