magnusbaeck / logstash-filter-verifier

Apache License 2.0
191 stars 27 forks source link

daemon sometimes hanging after running multiple tests #132

Closed jgough closed 2 years ago

jgough commented 2 years ago

Sometimes when running tests in beta 1 I find that the daemon can hang and become unresponsive. I've managed to reproduce this in a docker container with the configuration below:

/usr/share/logstash/config/pipelines.yml:

- pipeline.id: pipeline1
  path.config: "pipeline/pipeline1/*.conf"
- pipeline.id: pipeline2
  path.config: "pipeline/pipeline2/*.conf"
/usr/share/logstash/tests/test1.yml:

input_plugin: "pipeline1_input"
testcases:
  - input:
      - foo
    expected:
      - message: foo
/usr/share/logstash/tests/test2.yml:

fields:
  bar: 1234
input_plugin: "pipeline2_input"
testcases:
  - input:
      - foo
    expected:
      - message: foo

(Side note: this test above returns an error when running. Are ints in field values disallowed?)

/usr/share/logstash/pipeline/pipeline1/input.conf:

input { stdin { id => pipeline1_input } }
output { stdout {} }
/usr/share/logstash/pipeline/pipeline2/input.conf:

input { stdin { id => pipeline2_input } }
output { stdout {} }

If I run the sequence of commands below then the daemon does not respond to the last (or any subsequent) calls to run tests

[root@fdbcc49e3ad7 logstash]# logstash-filter-verifier daemon run --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test2.yml --add-missing-id
logstash-filter-verifier: error: Error reading/unmarshalling /usr/share/logstash/tests/test2.yml: Problem converting field "bar" to Logstash format: Unsupported type int: 1234
[root@fdbcc49e3ad7 logstash]# logstash-filter-verifier daemon run --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test1.yml --add-missing-id
☐ Comparing message 1 of 1 from test1.yml:
--- /tmp/826471442/test1.yml/1/expected 2021-09-10 12:19:16.854449387 +0000
+++ /tmp/826471442/test1.yml/1/actual   2021-09-10 12:19:16.854449387 +0000
@@ -1,3 +1,4 @@
 {
+  "@timestamp": "2021-09-10T12:19:16.702Z",
   "message": "foo"
 }

Summary: ☐ All tests: 0/1
         ☐ test1.yml: 0/1
logstash-filter-verifier: error: failed test cases
[root@fdbcc49e3ad7 logstash]# logstash-filter-verifier daemon run --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test2.yml --add-missing-id
logstash-filter-verifier: error: Error reading/unmarshalling /usr/share/logstash/tests/generic-css/test2.yml: Problem converting field "bar" to Logstash format: Unsupported type int: 1234
[root@fdbcc49e3ad7 logstash]# logstash-filter-verifier daemon run --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test1.yml --add-missing-id

(Test hangs here with no further output)

Here is the log from the daemon:

bash-4.2$ logstash-filter-verifier daemon start --loglevel DEBUG
config: socket: /tmp/logstash-filter-verifier.sock
config: logstash-path: /usr/share/logstash/bin/logstash
Temporary directory for daemon created in "/tmp/lfv-472481948"
state change: "created" -> "started" by "start"
start stdout scanner
start stderr scanner
Waiting for /tmp/lfv-472481948/logstash-instance/6T7FS2Rl/logstash.log to appear...
Daemon listening on /tmp/logstash-filter-verifier.sock
stdout:  Using bundled JDK: /usr/share/logstash/jdk
stderr:  OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
stderr:  WARNING: An illegal reflective access operation has occurred
stderr:  WARNING: Illegal reflective access by org.jruby.ext.openssl.SecurityHelper (file:/tmp/jruby-9449/jruby7941132299497382746jopenssl.jar) to field java.security.MessageDigest.provider
stderr:  WARNING: Please consider reporting this to the maintainers of org.jruby.ext.openssl.SecurityHelper
stderr:  WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
stderr:  WARNING: All illegal access operations will be denied in a future release
stdout:  Sending Logstash logs to /tmp/lfv-472481948/logstash-instance/6T7FS2Rl which is now configured via log4j2.properties
taillog: -> pipeline started: stdin
taillog: -> pipeline started: output
taillog: -> pipeline running: [output stdin]
Ready to process tests
state change: "started" -> "ready" by "pipeline-ready"
waitForState: ready
state change: "ready" -> "setting_up_test" by "setup-test"
taillog: -> pipeline started: lfv_9MB1Vkvr_pipeline1
taillog: -> pipeline started: lfv_9MB1Vkvr_pipeline2
taillog: -> pipeline started: lfv_output_4yQgfZr4
taillog: -> pipeline started: lfv_output_Am8AGcba
taillog: -> pipeline running: [output stdin lfv_9MB1Vkvr_pipeline2 lfv_9MB1Vkvr_pipeline1 lfv_output_Am8AGcba lfv_output_4yQgfZr4]
state change: "setting_up_test" -> "ready_for_test" by "pipeline-ready"
state change: "created" -> "started" by "start"
start stdout scanner
Waiting for /tmp/lfv-472481948/logstash-instance/TWGXc9MZ/logstash.log to appear...
start stderr scanner
waitForState: ready
stdout:  Using bundled JDK: /usr/share/logstash/jdk
stderr:  OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
stderr:  WARNING: An illegal reflective access operation has occurred
stderr:  WARNING: Illegal reflective access by org.jruby.ext.openssl.SecurityHelper (file:/tmp/jruby-9591/jruby842451978263668045jopenssl.jar) to field java.security.MessageDigest.provider
stderr:  WARNING: Please consider reporting this to the maintainers of org.jruby.ext.openssl.SecurityHelper
stderr:  WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
stderr:  WARNING: All illegal access operations will be denied in a future release
stdout:  Sending Logstash logs to /tmp/lfv-472481948/logstash-instance/TWGXc9MZ which is now configured via log4j2.properties
taillog: -> pipeline started: stdin
taillog: -> pipeline started: output
taillog: -> pipeline running: [output stdin]
Ready to process tests
state change: "started" -> "ready" by "pipeline-ready"
state change: "ready" -> "setting_up_test" by "setup-test"
waitForState: ready_for_test
taillog: -> pipeline started: lfv_cykqQbnV_pipeline2
taillog: -> pipeline started: lfv_cykqQbnV_pipeline1
taillog: -> pipeline started: lfv_output_KL0k5NKp
taillog: -> pipeline started: lfv_output_D8LPYHuC
taillog: -> pipeline running: [output stdin lfv_cykqQbnV_pipeline2 lfv_cykqQbnV_pipeline1 lfv_output_KL0k5NKp lfv_output_D8LPYHuC]
state change: "setting_up_test" -> "ready_for_test" by "pipeline-ready"
state change: "ready_for_test" -> "executing_test" by "execute-test"
waitForState: ready_for_test
taillog: -> pipeline started: lfv_input_1
taillog: -> pipeline running: [output stdin lfv_cykqQbnV_pipeline2 lfv_cykqQbnV_pipeline1 lfv_output_KL0k5NKp lfv_output_D8LPYHuC lfv_input_1]
state change: "executing_test" -> "running_test" by "pipeline-ready"
stdout:  {"@version":"1","message":"foo","__lfv_out_passed":"D8LPYHuC","@timestamp":"2021-09-10T12:19:16.702Z","tags":["__lfv_in_passed","__lfv_out_D8LPYHuC_passed"],"__lfv_metadata":{"__lfv_fields":{}},"__lfv_id":"0"}
waitForState: running_test
state change: "running_test" -> "ready_for_test" by "test-complete"
waitForState: ready_for_test
state change: "ready_for_test" -> "started" by "teardown"
taillog: -> pipeline running: [output stdin]
state change: "started" -> "ready" by "pipeline-ready"
waitForState: ready
state change: "ready" -> "setting_up_test" by "setup-test"
taillog: -> pipeline started: lfv_1MsvMrR2_pipeline2
taillog: -> pipeline started: lfv_1MsvMrR2_pipeline1
taillog: -> pipeline started: lfv_output_UHzWd3ND
taillog: -> pipeline started: lfv_output_PUrEGcvh
taillog: -> pipeline running: [output stdin lfv_1MsvMrR2_pipeline2 lfv_1MsvMrR2_pipeline1 lfv_output_UHzWd3ND lfv_output_PUrEGcvh]
state change: "setting_up_test" -> "ready_for_test" by "pipeline-ready"

Please let me know if I can provide more info on this.

breml commented 2 years ago

I observed this myself from time to time. Until now, I was not able to reproduce this in a reliable way. I assume some sort of timing issue, which leads to a situation, where the daemon is out of sync with the state of Logstash. Unfortunately, there is not much I can do about this at the moment, except for monitoring this further.

jgough commented 2 years ago

I found I could repro this 100% with the steps above. Would it help if I tried to make a docker container to repro if you are unable to reproduce this?

jgough commented 2 years ago

Here is a Dockerfile that starts a daemon then runs the four tests as described above. The fourth test always fails and the daemon always hangs.

# syntax=docker/dockerfile:1.3-labs
FROM docker.elastic.co/logstash/logstash:7.10.2

ENV LOGSTASH_FILTER_VERIFIER_VERSION v2.0.0-beta.1

USER root
RUN yum clean expire-cache && yum update -y && yum install curl && yum clean all
ADD https://github.com/magnusbaeck/logstash-filter-verifier/releases/download/${LOGSTASH_FILTER_VERIFIER_VERSION}/logstash-filter-verifier_${LOGSTASH_FILTER_VERIFIER_VERSION}_linux_386.tar.gz /opt/
RUN tar xvzf /opt/logstash-filter-verifier_${LOGSTASH_FILTER_VERIFIER_VERSION}_linux_386.tar.gz -C /opt \
    && mv /opt/logstash-filter-verifier /usr/bin/

USER logstash
RUN <<EOF
mkdir tests
mkdir pipeline/pipeline1
mkdir pipeline/pipeline2

cat <<EOT > /usr/share/logstash/config/pipelines.yml
- pipeline.id: pipeline1
  path.config: "pipeline/pipeline1/*.conf"
- pipeline.id: pipeline2
  path.config: "pipeline/pipeline2/*.conf"
EOT

cat <<EOT > /usr/share/logstash/tests/test1.yml
input_plugin: "pipeline1_input"
testcases:
  - input:
      - foo
    expected:
      - message: foo
EOT

cat <<EOT > /usr/share/logstash/tests/test2.yml
fields:
  bar: 1234
input_plugin: "pipeline2_input"
testcases:
  - input:
      - foo
    expected:
      - message: foo
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline1/input.conf
input { stdin { id => pipeline1_input } }
output { stdout {} }
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline2/input.conf
input { stdin { id => pipeline2_input } }
output { stdout {} }
EOT

cat <<EOT > /usr/share/logstash/run_tests.sh
logstash-filter-verifier daemon start &
sleep 10
echo "Running Test 1"
logstash-filter-verifier daemon run --loglevel DEBUG --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test2.yml --add-missing-id
echo "Running Test 2"
logstash-filter-verifier daemon run --loglevel DEBUG --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test1.yml --add-missing-id
echo "Running Test 3"
logstash-filter-verifier daemon run --loglevel DEBUG --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test2.yml --add-missing-id
echo "Running Test 4"
logstash-filter-verifier daemon run --loglevel DEBUG --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test1.yml --add-missing-id
echo "Done"
EOT

chmod a+x run_tests.sh
EOF

CMD ["/bin/bash", "/usr/share/logstash/run_tests.sh"]

If you whack that into a Dockerfile and then run:

DOCKER_BUILDKIT=1 docker build --tag test .
docker run --rm test

Then the Daemon will hang on the fourth test. Note: need to use buildkit since this Dockerfile uses heredocs

Hope this helps

breml commented 2 years ago

@jgough Thanks for the fantastic Dockerfile, this really reproduced the problem consistently and it helped me a lot to tackle the issue. I created an other PR (#137) with the fix for this problem.

The problem is triggered by the failing int field (addressed in #136 ). Unfortunately the teardown sequence was not done correctly and therefore the daemon was not able to accept an other test session.