opensearch-project / opensearch-ci

Enables continuous integration across OpenSearch, OpenSearch Dashboards, and plugins.
Apache License 2.0
15 stars 25 forks source link

[BUG] Gradle Check Log missing after a while #484

Closed peterzhuamazon closed 1 month ago

peterzhuamazon commented 1 month ago

[BUG] Gradle Check Log missing after a while

Hi,

We recently see the gradle check logs missing over time in Jenkins console: https://build.ci.opensearch.org/blue/organizations/jenkins/gradle-check/detail/gradle-check/43293/pipeline

java.io.FileNotFoundException: /var/jenkins_home/jobs/gradle-check/builds/43293/log (No such file or directory)

    at java.base/java.io.RandomAccessFile.open0(Native Method)

    at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:356)

These failures observed on both true failures and success runs.

Thanks.


PRs:

peterzhuamazon commented 1 month ago

Adding latest update from the docs: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html

auto_removal – Optional. If this is true, the CloudWatch agent automatically deletes this log file after reading it and it has been rotated. Usually the log files are deleted after their entire contents are uploaded to CloudWatch Logs, but if the agent reaches the EOF (end of file) and also detects another newer log file that matches the same file_path, the agent deletes the OLD file, so you must make sure that you are done writing to the OLD file before creating the NEW file. The [RUST tracing library](https://docs.rs/tracing/latest/tracing/)

has a known incompatibility because it will potentially create a NEW log file and then still attempt to write to the OLD log file.

The agent only removes complete files from logs that create multiple files, such as logs that create separate files for each date. If a log continuously writes to a single file, it is not removed.

If you already have a log file rotation or removal method in place, we recommend that you omit this field or set it to false.

If you omit this field, the default value of false is used.
peterzhuamazon commented 1 month ago

Especially this symptoms matching the case, where one run is waiting for the agent, so that file is EOL and will not getting write until Agent inbound. As soon as the agent inbound and try to write new content, it was treated as a new file, then agent delete the file after all.

peterzhuamazon commented 1 month ago

Seems good. Thanks. https://build.ci.opensearch.org/job/gradle-check/43385/console