grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.29k stars 175 forks source link

Unexpected behavior with windows agent-flow installer: binaries are not updated #565

Open jkroepke opened 5 months ago

jkroepke commented 5 months ago

What's wrong?

Yesterday, we rollout an update from 0.37.x to 0.40.3 to our machine and we could observe that some windows machines didn't come-up after upgrade.

The agent exited with 1, because it say the new module syntax "import.http" as syntax error which was suspicious to me. I checked first, if the setup itself had some issues. But the setups runs with exit code 0 and I could observe, that registry values was updated by the setup, too. At least see the new version on programs directory.

image

However, for unknown reasons, the binary was not updated. I could identity this by the "date" modified value of the binary

image

From other systems, I could identify that the binary from version 0.40.3 has a modified timestamp of 2024. It would be great to attach an application manifest here which expose the correct version here.

In our logs, we could observe, that the installer return exit code 0 and does not provide any output on stdout and stderr.

Maybe the setup had issues with replaced both binaries and failed silent.

One possible reason could be a slow shutdown of the service. I have no idea, if the command sc stop blocks until the service has been stopped. As I know, binaries which are running, can not be replaced on windows.

Steps to reproduce

I have no idea, how I can reproduce it. Re-run the setup resolve the issue.

System information

Windows Server 2016

Software version

v0.40.3

Configuration

Running setup with `/S`. The setup was executed as builtin SYSTEM user.

Logs

Setup provides no logs
rfratto commented 5 months ago

Hi there :wave:

On April 9, 2024, Grafana Labs announced Grafana Alloy, the spirital successor to Grafana Agent and the final form of Grafana Agent Flow mode. As a result, Grafana Agent has been deprecated and will only be receiving bug and security fixes until its end-of-life around November 1, 2025.

To make things easier for maintainers, we're in the process of migrating all issues tagged variant/flow to the Grafana Alloy repository to have a single home for tracking issues. This issue is likely something we'll want to address in both Grafana Alloy and Grafana Agent, so just because it's being moved doesn't mean we won't address the issue in Grafana Agent :)

hainenber commented 5 months ago

I intend to put some sleep like 5 seconds after executing sc stop in the NSIS installer. Do you think it'll resolve the perceived root cause?

jkroepke commented 5 months ago

Maybe. There are still super rare conditions, if defender is scanning a binary, it can not be updated as well.

I have no idea, why the NSIS does not run into an error here.

Our current workaround is that we do checksum comparison after the installation and if there is a checksum mismatch, we run the installation again. This workaround is running in a loop with 10 tries.

github-actions[bot] commented 4 months ago

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it. If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue. The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity. Thank you for your contributions!

jkroepke commented 4 months ago

needs-attention