magnusbaeck / logstash-filter-verifier

Apache License 2.0
195 stars 27 forks source link

Codec sometimes not preserved on input plugin #138

Closed jgough closed 3 years ago

jgough commented 3 years ago

In Beta 1, sometimes the codec does not get transferred when running the tests.

For example I have the following file in my pipeline:

/usr/share/logstash/pipeline/pipeline2/01-input.conf:
input {
        s3 {
                codec => csv {
                        autodetect_column_names => true
                        skip_empty_columns => true
                }
                id => "input"
        }
}
output { stdout {} }

And the test

input_plugin: "input"
ignore:
  - "@timestamp"
testcases:
  - input:
    - >
      "field1","field2","field3"
    expected: []
  - input:
    - >
      "a","b","c"
    expected:
      - field1: "a"
        field2: "b"
        field3: "c"

Then the tests pass. The input generator created by lfv in /tmp/lfv-*/session/*/lfv_inputs/1/input.conf correctly looks like this:

input {
  generator {
    lines => [
      '"field1","field2","field3"
', '"a","b","c"
'
    ]
    codec => csv {
  autodetect_column_names => true
  skip_empty_columns => true
    }
    count => 1
    threads => 1
  }
}

If I create simply an empty file called 99-output.conf and place it in the same directory and run again then strangely it fails. It reverts to the plain codec and it looks like this:

input {
  generator {
    lines => [
      '"field1","field2","field3"
', '"a","b","c"
'
    ]
    codec => plain
    count => 1
    threads => 1
  }

Here is a Dockerfile that can be used to repro this issue, if that is of any help:

# syntax=docker/dockerfile:1.3-labs
FROM docker.elastic.co/logstash/logstash:7.10.2

ENV LOGSTASH_FILTER_VERIFIER_VERSION v2.0.0-beta.1

RUN logstash-plugin install logstash-codec-csv

USER root

RUN yum clean expire-cache &&\
    yum update -y && \
    yum install curl && \
    yum clean all

ADD https://github.com/magnusbaeck/logstash-filter-verifier/releases/download/${LOGSTASH_FILTER_VERIFIER_VERSION}/logstash-filter-verifier_${LOGSTASH_FILTER_VERIFIER_VERSION}_linux_386.tar.gz /opt/
RUN tar xvzf /opt/logstash-filter-verifier_${LOGSTASH_FILTER_VERIFIER_VERSION}_linux_386.tar.gz -C /opt \
    && mv /opt/logstash-filter-verifier /usr/bin/

USER logstash

RUN <<EOF
mkdir tests
mkdir pipeline/pipeline1
mkdir pipeline/pipeline2

cat <<EOT > /usr/share/logstash/config/pipeline1.yml
- pipeline.id: pipeline1
  path.config: "pipeline/pipeline1/*.conf"
EOT

cat <<EOT > /usr/share/logstash/config/pipeline2.yml
- pipeline.id: pipeline2
  path.config: "pipeline/pipeline2/*.conf"
EOT

cat <<EOT > /usr/share/logstash/tests/test1.yml
input_plugin: "input"
ignore:
  - "@timestamp"
testcases:
  - input:
    - >
      "field1","field2","field3"
    expected: []
  - input:
    - >
      "a","b","c"
    expected:
      - field1: "a"
        field2: "b"
        field3: "c"
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline1/01-input.conf
input {
        s3 {
                codec => csv {
                        autodetect_column_names => true
                        skip_empty_columns => true
                }
                id => "input"
        }
}

output { stdout {} }
EOT

cat <<EOT > /usr/share/logstash/pipeline/pipeline2/01-input.conf
input {
        s3 {
                codec => csv {
                        autodetect_column_names => true
                        skip_empty_columns => true
                }
                id => "input"
        }
}
output { stdout {} }
EOT

touch /usr/share/logstash/pipeline/pipeline2/99-output.conf

cat <<EOT > /usr/share/logstash/run_tests.sh
logstash-filter-verifier daemon start --no-cleanup &
sleep 5
logstash-filter-verifier daemon run --loglevel DEBUG --pipeline /usr/share/logstash/config/pipeline1.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test1.yml --add-missing-id
logstash-filter-verifier daemon run --loglevel DEBUG --pipeline /usr/share/logstash/config/pipeline2.yml --pipeline-base /usr/share/logstash/ --testcase-dir /usr/share/logstash/tests/test1.yml --add-missing-id

echo "Should see csv codec twice below:"
cat /tmp/lfv-*/session/*/lfv_inputs/1/input.conf | grep -A 10 "input {"
EOT

EOF

CMD ["/bin/bash", "/usr/share/logstash/run_tests.sh"]

Build and run with:

DOCKER_BUILDKIT=1 docker build --tag test .
docker run --rm test

Am I doing something wrong here? As an aside, is there a better way of testing using the CSV codec here where multiple separate lines are required to get the headers?

jgough commented 3 years ago

I want to point out that it isn't simply just the presence of an empty file that does this, if you move the output{} section into the new file the issue still occurs. Just found it odd that the presence of an empty file alters the behaviour! Thanks!

breml commented 3 years ago

@jgough Thanks again for your detailed bug report. I can confirm the existence of this bug in beta1. In current master, this bug is gone. So look forward to the next beta release. I verified this with the altered integration tests in #140.