logstash-plugins / logstash-output-file

Apache License 2.0
23 stars 53 forks source link

The stale files cleanup cycle does not run if no new events #88

Open luizgpsantos opened 4 years ago

luizgpsantos commented 4 years ago
input {
    file {
        path => "/tmp/input.json"
        codec => json
    }
}

output {
    file {
        path => "/tmp/output.json.gz"
        codec => "json_lines"
        gzip => "true"
    }
}
$ while true; do echo '{"name":"app1"}' >> /tmp/input.json; sleep 4; done
  1. Start logstash in debug mode using the pipeline provided above and grep the logs for relevant events:
$ ./bin/logstash -f fileoutputtest.conf --debug

$ tail -f logstash-plain.log | egrep -i "stale|opening|closing"
  1. Write content to the input.json file:

$ while true; do echo '{"name":"app1"}' >> /tmp/input.json; sleep 4; done
  1. Observe that logstash opens /tmp/output.json.gz and the stale check runs every 10 seconds:
[2020-06-01T11:52:48,406][INFO ][logstash.outputs.file    ] Opening file {:path=>"/tmp/output.json.gz"}
[2020-06-01T11:52:48,445][DEBUG][logstash.outputs.file    ] Starting stale files cleanup cycle {:files=>{"/tmp/output.json.gz"=>#<IOWriter:0x7cd4733c @active=true, @io=#<Zlib::GzipWriter:0x7b4db1ec>>}}
[2020-06-01T11:52:48,448][DEBUG][logstash.outputs.file    ] 0 stale files found {:inactive_files=>{}}
[2020-06-01T11:53:00,339][DEBUG][logstash.outputs.file    ] Starting stale files cleanup cycle {:files=>{"/tmp/output.json.gz"=>#<IOWriter:0x7cd4733c @active=true, @io=#<Zlib::GzipWriter:0x7b4db1ec>>}}
[2020-06-01T11:53:00,339][DEBUG][logstash.outputs.file    ] 0 stale files found {:inactive_files=>{}}
  1. Stop writing to input.json and observe that stale checks are not performed anymore (logs does not print stale messages anymore).

Can we improve this behavior to make the stale check event independent? Thanks!

TheVastyDeep commented 2 years ago

It would be a fairly trivial change to add a periodic_flush method (like the aggregate filter), and rename close_stale_files as 'flush' and remove the call to it from multi_receive_encoded.

That said, this bug is fixed in the java_execution engine, because that sends empty batches through the pipeline every 15 seconds, so multi_receive_encoded is called to process it and that closes files. But note that that is an issue and if it get fixed this breaks again.