logstash-plugins / logstash-output-google_cloud_storage

Apache License 2.0
9 stars 26 forks source link

Empty file check for GZip content-type #55

Open sawarkarma opened 1 month ago

sawarkarma commented 1 month ago

We are evaluating the GZip content-type to reduce the network latency between the Logstash and Google CloudStorage, and we identified one issue related to empty files. like when there is no content to flush within configured interval it will just create empty files in google cloud storage. Will check if someone has any workaround for the same. anyway I will propose a change please approve or suggest a better solution / workaround for the same

Logstash information:

Please include the following information:

  1. Logstash version (e.g. bin/logstash --version) 8.14.3
  2. Logstash installation source (e.g. built from source, with a package manager: DEB/RPM, expanded from tar or zip archive, docker) : expanded from tar or zip archive
  3. How is Logstash being run (e.g. as a service/service manager: systemd, upstart, etc. Via command line, docker/kubernetes) : command run
  4. How was the Logstash Plugin installed : bin/logstash-plugin install /path/to/gem/file.gem

JVM (e.g. java -version): temurin-11

If the affected version of Logstash is 7.9 (or earlier), or if it is NOT using the bundled JDK or using the 'no-jdk' version in 7.10 (or higher), please provide the following information:

  1. JVM version (java -version)
  2. JVM installation source (e.g. from the Operating System's package manager, from source, etc).
  3. Value of the JAVA_HOME environment variable if set.

OS version (uname -a if on a Unix-like system): 23.6.0 Darwin

Description of the problem including expected versus actual behavior: the expected behaviour is once the interval is finished and there is no content to flush to GCS Bucket it should ignore and rotate the temp file. The Actual behaviour is once the interval period is over and there is no content in the temp GZip it just writes the empty file to GCS Bucket

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including (e.g.) pipeline definition(s), settings, locale, etc. The easier you make for us to reproduce it, the more likely that somebody will take the time to look at it.

  1. use below output plugin configuration google_cloud_storage { bucket => "bucket-abc" temp_directory => "/tmp/" log_file_prefix => "AnyPrefix" max_file_size_kbytes => 5120 max_concurrent_uploads => 5 codec => plain { format => "%{message}" } output_format => "json" date_pattern => "%Y-%m-%dT%H-%M-00" flush_interval_secs => 5 gzip => true gzip_content_encoding => false uploader_interval_secs => 60 include_uuid => true include_hostname => true }

  2. Install the google_cloud_storage plugin

  3. start the logstash with the given conf

  4. wiat for some time, the empty GZips will be created in the GCS bucket

Provide logs (if relevant): none as of now

sawarkarma commented 1 month ago

proposing change https://github.com/logstash-plugins/logstash-output-google_cloud_storage/pull/56 please review if possible

sawarkarma commented 1 month ago

please help us resolving this issue