If an output file is deleted (in our case by being gzip'ed), Logstash continues to hold the file open. Over time these zombie files fill up the filesystem space until Logstash is restarted.
Version: 5.5.1
Operating System: Ubuntu 16.04 (xenial)
Config File (if you have sensitive info, please remove it):
Input is a group of Kafka queues/topics.
Output file defined as
path => "${OUTPUTDIR}/%{kf_topic}/%{table_name}-%{kf_topic}-%{logstash_host}-%{+YYYY-MM-dd-HH}.json"
Steps to Reproduce:
A cron a job is executed at 04 minutes pasty every hour. In that job, for each file that is not from the current hour, the file is moved to a filename with the current date appended (YYmmdd_HHMMSS), compressed usinf Gzip, then transferred to Amazon S3 storage. The Kafka queues are log data from devices, and each hourly output file can get quite large. Over time, if Logstash is not restarted, the system will believe the file system is filling up due to many deleted files still held open by the Logstash process. After deleting files, execute lsof -nP -p $(pgrep -d , java) | grep '(deleted)' to view them. A df will show filesystem space being utilized, yet the output of du will not include the sizes of these files. Restart Logstash and the output of the lsof command will now show zero deleted files, and the output of df will no longer report the utilization of the zombie files.
If an output file is deleted (in our case by being gzip'ed), Logstash continues to hold the file open. Over time these zombie files fill up the filesystem space until Logstash is restarted.
path => "${OUTPUTDIR}/%{kf_topic}/%{table_name}-%{kf_topic}-%{logstash_host}-%{+YYYY-MM-dd-HH}.json"
lsof -nP -p $(pgrep -d , java) | grep '(deleted)'
to view them. Adf
will show filesystem space being utilized, yet the output ofdu
will not include the sizes of these files. Restart Logstash and the output of the lsof command will now show zero deleted files, and the output of df will no longer report the utilization of the zombie files.