Closed jmapro closed 2 years ago
Thanks @jmapro. We'll take a look at at how best to address this.
Hi !
I just had an update to v0.43.0 and this bug seems to be still present. My service was stopped and disabled after the update.
$ systemctl status splunk-otel-collector
● splunk-otel-collector.service - Splunk OpenTelemetry Collector
Loaded: loaded (/lib/systemd/system/splunk-otel-collector.service; disabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/splunk-otel-collector.service.d
└─service-owner.conf
Active: inactive (dead)
||/ Name Version Architecture Description
+++-=====================-============-============-=================================
iU splunk-otel-collector 0.43.0 amd64 Splunk OpenTelemetry Collector
@jmapro Please see if there are any errors in the journald logs (sudo journalctl -u splunk-otel-collector
), or when starting the collector manually (otelcol --config=<path to your config file>
).
Feb 07 02:02:27 systemd[1]: Stopping Splunk OpenTelemetry Collector...
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.921Z info service/collector.go:166 Received signal from OS {"signal": "terminated"}
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.923Z info service/collector.go:255 Starting shutdown...
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.926Z info healthcheck/handler.go:129 Health Check state change {"kind": "extension", "name": "health_check", "st>
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.931Z info service/service.go:121 Stopping receivers...
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.941Z info prometheusexecreceiver@v0.41.0/receiver.go:252 Subprocess start delay {"kind": "receiver", "name": "pr>
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.953Z info service/service.go:126 Stopping processors...
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.953Z info builder/pipelines_builder.go:73 Pipeline is shutting down... {"name": "pipeline", "name": "metrics"}
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.954Z info builder/pipelines_builder.go:77 Pipeline is shutdown. {"name": "pipeline", "name": "metrics"}
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.954Z info builder/pipelines_builder.go:73 Pipeline is shutting down... {"name": "pipeline", "name": "metrics/int>
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.954Z info builder/pipelines_builder.go:77 Pipeline is shutdown. {"name": "pipeline", "name": "metrics/internal"}
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.954Z info builder/pipelines_builder.go:73 Pipeline is shutting down... {"name": "pipeline", "name": "metrics/squ>
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.954Z info builder/pipelines_builder.go:77 Pipeline is shutdown. {"name": "pipeline", "name": "metrics/squid"}
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.954Z info service/service.go:131 Stopping exporters...
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.954Z info service/service.go:136 Stopping extensions...
Feb 07 02:02:27 otelcol[229900]: 2022-02-07T02:02:27.954Z info service/collector.go:273 Shutdown complete.
Feb 07 02:02:27 systemd[1]: splunk-otel-collector.service: Succeeded.
Feb 07 02:02:27 systemd[1]: Stopped Splunk OpenTelemetry Collector.
-- Reboot --
Feb 07 09:20:58 systemd[1]: Started Splunk OpenTelemetry Collector.
Feb 07 09:20:59 otelcol[54160]: 2022/02/07 09:20:59 main.go:280: Set config to /etc/otel/collector/agent_config.yaml
Feb 07 09:20:59 otelcol[54160]: 2022/02/07 09:20:59 main.go:346: Set ballast to 168 MiB
Feb 07 09:20:59 otelcol[54160]: 2022/02/07 09:20:59 main.go:360: Set memory limit to 460 MiB
Feb 07 09:20:59 otelcol[54160]: 2022/02/07 09:20:59 remove_ballast_key.go:41: [WARNING] `ballast_size_mib` parameter in `memory_limiter` processor is deprecated. Please update the config accord>
Feb 07 09:20:59 otelcol[54160]: 2022/02/07 09:20:59 move_otlp_insecure.go:42: Unsupported key found: exporters::otlp::insecure. Moving to exporters::otlp::tls::insecure
the agent was stopped at 2:02:27 this morning for a server reboot after system patch. The service did not restart until I manually start it. This is because the deb package update has put the service in disabled state.
I had some non blocking errors in logs. The agent start and send all other metrics.
Feb 07 02:01:32 otelcol[229900]: 2022-02-07T02:01:32.127Z error subprocessmanager/manager.go:101 subprocess output line {"kind": "receiver", "name": "prometheus_exec/
squid", "output": "2022/02/07 02:01:32 servicec times - could not parse line: Service Time Percentiles 5 min 60 min:"}
Feb 07 02:01:32 otelcol[229900]: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusexecreceiver/subprocessmanager.(*SubprocessConfig).pipeSubprocessOutput
Feb 07 02:01:32 otelcol[229900]: /builds/o11y-gdi/splunk-otel-collector-releaser/.go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusexecreceiver@v0.
41.0/subprocessmanager/manager.go:101
Thanks @jmapro. I ran some basic upgrade tests, but have not yet been able to reproduce the issue. I'll continue to investigate, but please provide any additional info if possible.
apt
, or manually with dpkg
, or some other method?/var/log/apt/*.log
or /var/log/dpkg.log
?Thanks for your help @jcheng-splunk . I found an issue in my apt configuration, I have to force oldconf or something like that.
Start-Date: 2022-02-07 02:02:26
Commandline: apt-get -y --only-upgrade true install splunk-otel-collector=0.43.0
Requested-By: nxautomation (995)
Upgrade: splunk-otel-collector:amd64 (0.41.0, 0.43.0)
Error: Sub-process /usr/bin/dpkg returned an error code (1)
End-Date: 2022-02-07 02:03:01
Log started: 2022-02-07 04:10:41
Setting up splunk-otel-collector (0.43.0) ...
Configuration file '/etc/otel/collector/agent_config.yaml'
==> Modified (by you or by a script) since installation.
==> Package distributor has shipped an updated version.
What would you like to do about it ? Your options are:
Y or I : install the package maintainer's version
N or O : keep your currently-installed version
D : show the differences between the versions
Z : start a shell to examine the situation
The default action is to keep your current version.
*** agent_config.yaml (Y/I/N/O/D/Z) [default=N] ? dpkg: error processing package splunk-otel-collector (--configure):
end of file on stdin at conffile prompt
Errors were encountered while processing:
splunk-otel-collector
Log ended: 2022-02-07 04:10:42
On update the collector agent is stopped and disabled by the preinstall.sh script but the postinstall.sh script never restart it.
So when we do automatic system upgrade we have to do manual actions to restart the agent. I think the agent must be restarted on upgrade.
Tested OS: