Closed jgough closed 2 years ago
I observed this myself from time to time. Until now, I was not able to reproduce this in a reliable way. I assume some sort of timing issue, which leads to a situation, where the daemon is out of sync with the state of Logstash. Unfortunately, there is not much I can do about this at the moment, except for monitoring this further.
I found I could repro this 100% with the steps above. Would it help if I tried to make a docker container to repro if you are unable to reproduce this?
Here is a Dockerfile that starts a daemon then runs the four tests as described above. The fourth test always fails and the daemon always hangs.
# syntax=docker/dockerfile:1.3-labs
FROM docker.elastic.co/logstash/logstash:7.10.2
ENV LOGSTASH_FILTER_VERIFIER_VERSION v2.0.0-beta.1
USER root
RUN yum clean expire-cache && yum update -y && yum install curl && yum clean all
ADD https://github.com/magnusbaeck/logstash-filter-verifier/releases/download/${LOGSTASH_FILTER_VERIFIER_VERSION}/logstash-filter-verifier_${LOGSTASH_FILTER_VERIFIER_VERSION}_linux_386.tar.gz /opt/
RUN tar xvzf /opt/logstash-filter-verifier_${LOGSTASH_FILTER_VERIFIER_VERSION}_linux_386.tar.gz -C /opt \
&& mv /opt/logstash-filter-verifier /usr/bin/
USER logstash
RUN <<EOF
mkdir tests
mkdir pipeline/pipeline1
mkdir pipeline/pipeline2
cat <<EOT > /usr/share/logstash/config/pipelines.yml
- pipeline.id: pipeline1
path.config: "pipeline/pipeline1/*.conf"
- pipeline.id: pipeline2
path.config: "pipeline/pipeline2/*.conf"
EOT
cat <<EOT > /usr/share/logstash/tests/test1.yml
input_plugin: "pipeline1_input"
testcases:
- input:
- foo
expected:
- message: foo
EOT
cat <<EOT > /usr/share/logstash/tests/test2.yml
fields:
bar: 1234
input_plugin: "pipeline2_input"
testcases:
- input:
- foo
expected:
- message: foo
EOT
cat <<EOT > /usr/share/logstash/pipeline/pipeline1/input.conf
input { stdin { id => pipeline1_input } }
output { stdout {} }
EOT
cat <<EOT > /usr/share/logstash/pipeline/pipeline2/input.conf
input { stdin { id => pipeline2_input } }
output { stdout {} }
EOT
cat <<EOT > /usr/share/logstash/run_tests.sh
logstash-filter-verifier daemon start &
sleep 10
echo "Running Test 1"
logstash-filter-verifier daemon run --loglevel DEBUG --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test2.yml --add-missing-id
echo "Running Test 2"
logstash-filter-verifier daemon run --loglevel DEBUG --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test1.yml --add-missing-id
echo "Running Test 3"
logstash-filter-verifier daemon run --loglevel DEBUG --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test2.yml --add-missing-id
echo "Running Test 4"
logstash-filter-verifier daemon run --loglevel DEBUG --pipeline /usr/share/logstash/config/pipelines.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test1.yml --add-missing-id
echo "Done"
EOT
chmod a+x run_tests.sh
EOF
CMD ["/bin/bash", "/usr/share/logstash/run_tests.sh"]
If you whack that into a Dockerfile and then run:
DOCKER_BUILDKIT=1 docker build --tag test .
docker run --rm test
Then the Daemon will hang on the fourth test. Note: need to use buildkit since this Dockerfile uses heredocs
Hope this helps
@jgough Thanks for the fantastic Dockerfile, this really reproduced the problem consistently and it helped me a lot to tackle the issue. I created an other PR (#137) with the fix for this problem.
The problem is triggered by the failing int
field (addressed in #136 ). Unfortunately the teardown sequence was not done correctly and therefore the daemon was not able to accept an other test session.
Sometimes when running tests in beta 1 I find that the daemon can hang and become unresponsive. I've managed to reproduce this in a docker container with the configuration below:
(Side note: this test above returns an error when running. Are ints in field values disallowed?)
If I run the sequence of commands below then the daemon does not respond to the last (or any subsequent) calls to run tests
(Test hangs here with no further output)
Here is the log from the daemon:
Please let me know if I can provide more info on this.