open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.13k stars 2.41k forks source link

OpampSupervisor/OpampExtension is not restarting if the remote config changes applied to the processor pipeline #34377

Open MSA0208 opened 4 months ago

MSA0208 commented 4 months ago

Component(s)

cmd/opampsupervisor, extension/opamp

Describe the issue you're reporting

Hi ,

have started opamp server , parallely started supervisor which has extension and collector details for execution of the collector,

my collector includes the transform processor as part of the pipeline.

Am using opamp for the remote configuration restart for my collector, i have observed that the collector is restarting if i add the new configurations in the pipeline and thats working fine

similarly if i want to update the processor configuration for transforming , if i update my config.yaml remotely , server is accepting the remote changes given , but i could see no restart on the collector for this remotely pushed changes.

any inputs on this process config changes , do we have this capability for any changes in the config or its limited to the service pipeline alone?

github-actions[bot] commented 4 months ago

Pinging code owners:

Frapschen commented 4 months ago

@MSA0208 Can you share your opampsupervisor and otel collctor config?

MSA0208 commented 4 months ago

@Frapschen nothing much changes from the existing opampsupervisor, apart from the bootstrap.yaml , which is my own collector configurations to start my collector binary and am using bootstrap.yaml itself as my config.yaml as of now

MSA0208 commented 4 months ago

now am able to restart the collector, if thers any change in the pipeline , otherwise its not restarting

what i want to know is if we do some small change in the processor, OTTL , i want to restart to pick those changes

bacherfl commented 3 months ago

Hi @MSA0208 is this issue still occurring for you? I just tried this out with the current state on main and it seems that the agent is restarted when editing something in e.g. the transform processor. For example, I started with the the following additional configuration which i set in the opamp server:

processors:
    transform:
        error_mode: ignore
        flatten_data: false
        log_statements: []
        metric_statements:
            - conditions: []
              context: metric
              statements: []
        trace_statements:
            - conditions: []
              context: resource
              statements:
                - keep_matching_keys(attributes, "^(aaa|bbb|c).*")

exporters:
  debug:
service:
  pipelines:
    metrics:
      exporters:
      - debug
      processors:
      - transform
      receivers:
      - prometheus/own_metrics

And the agent was restarted. After that, I changed one of the ottl statements and the agent was restarted again.

Can you maybe share an example for the config you were using so I can try to reproduce the issue?

MSA0208 commented 3 months ago

Hi @bacherfl ,

now the issue is resolved and is working as expected. but the current issue am facing is am trying with TLS certs , and it always says first record doesnt look like TLS because of supervisor may be.

i tried connecting opamp server and opamp agent client using TLS , thats working , but when i use supervisor in the middle am getting the above mentioned error,

so still debugging the TLS w.r.t to supervisor, any inputs here will help

bacherfl commented 3 months ago

Thanks for the update @MSA0208 - Can you share the opampsupervisor config you are using? Then I will try to see if I can reproduce the issue you are having with TLS

MSA0208 commented 3 months ago

server: endpoint: ws://127.0.0.1:4320/v1/opamp tls: insecure_skip_verify: true ca_file: "/root/OTEL98/opamp-go-main/internal/certs/certs/ca.cert.pem" cert_file: "/root/OTEL98/opamp-go-main/internal/certs/server_certs/server.cert.pem" key_file: "/root/OTEL98/opamp-go-main/internal/certs/server_certs/server.key.pem"

capabilities:

Keys with boolean true/false values that enable a particular

OpAMP capability.

The Supervisor will accept remote configuration from the Server.

If enabled the Supervisor will also report RemoteConfig status

to the Server.

AcceptsRemoteConfig: true # false if unspecified accepts_remote_config: true reports_remote_config: true accepts_restart_command: true reports_effective_config: true reports_own_metrics: true reports_health: true accepts_opamp_connection_settings: true

storage: agent:

executable: /root/PI40/aiopsx-platform-NGx_NorthBound/cmd/otelcol-ngx/ngx-connector

executable: ../cmd/otelcol-ngx/ngx-connector

args: --config env:

config_fil: ./config.yaml

access_dirs: read: allow: [/var/log] deny: [/var/log/secret_logs] write: allow: [/var/otelcol]

this is the supervisor.yaml file, same TLS am using in opampserver while starting and same am passing to my actual config.yaml file as well

bacherfl commented 2 months ago

Hi @MSA0208 and sorry for the late reply, but I now looked into the issue you are having with TLS. Looking at the config, you are using the private key and certificate used by the opamp server, i.e. this one: https://github.com/open-telemetry/opamp-go/tree/main/internal/certs/server_certs. However, this certificate can not be used for authenticating clients at the server, as it lacks the TLS Web Client Authentication key usage extension. For this reason, the opamp agent client example creates its own key pair when connecting to the server (see https://github.com/open-telemetry/opamp-go/blob/ad5317009abb490ff5e57e564ac8e82f70f9f477/internal/examples/agent/agent/agent.go#L363) - you can use that as a reference to create a key pair for the supervisor and then use the new key pair to connect to the opamp server.

MSA0208 commented 2 months ago

Hi @bacherfl , Thank you for your inputs. Am using openSSL to generate these certs and using the same in opamp extension, supervisor.yaml and opampserver as well. so your saying we need to create keypair and use the same in all 3 mentioned above? am bit confused here, Should i use the new keypair as part of supervisor config , or opampextension config?

From the Logs of supervisor i see that it always falling back to http settings.TLSConfig : settings.httpMiddleware : hs.TLSConfig from serverimpl.go Falling back to http!!! with listenAddr : localhost:4322 Started startHttpServer with listenAddr: localhost:4322

and from the Agent log i.e, collector , the error is like 2024-10-03T12:20:21.673-0700 error opampextension@v0.98.0/opamp_agent.go:76 Failed to connect to the OpAMP server {"kind": "extension", "name": "opamp", "error": "tls: first record does not look like a TLS handshake"} gitlab.otxlab.net/itom/opr/opsb-content/aiopsx-platform/extension/opampextension.(*opampAgent).Start.func2

bacherfl commented 2 months ago

Hi @bacherfl , Thank you for your inputs. Am using openSSL to generate these certs and using the same in opamp extension, supervisor.yaml and opampserver as well. so your saying we need to create keypair and use the same in all 3 mentioned above? am bit confused here, Should i use the new keypair as part of supervisor config , or opampextension config?

No, you can keep using the key pair you were using for the server, but you need to create a separate key pair (using the same certificate authority used for creating the server certificates, i.e. this one) with the TLS Web Client Authentication key usage extension enabled, and use that for the supervisor.

From the Logs of supervisor i see that it always falling back to http settings.TLSConfig : settings.httpMiddleware : hs.TLSConfig from serverimpl.go Falling back to http!!! with listenAddr : localhost:4322 Started startHttpServer with listenAddr: localhost:4322

and from the Agent log i.e, collector , the error is like 2024-10-03T12:20:21.673-0700 error opampextension@v0.98.0/opamp_agent.go:76 Failed to connect to the OpAMP server {"kind": "extension", "name": "opamp", "error": "tls: first record does not look like a TLS handshake"} gitlab.otxlab.net/itom/opr/opsb-content/aiopsx-platform/extension/opampextension.(*opampAgent).Start.func2

I noticed that the opamp server url in the config you posted earlier started with ws - Due to the change in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/35363 the TLS setting are only applied if the server URL starts with https or wss

github-actions[bot] commented 21 hours ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.