logstash-plugins / logstash-input-s3

Apache License 2.0
57 stars 150 forks source link

Large files are very slow to read locally #219

Open yogevyuval opened 3 years ago

yogevyuval commented 3 years ago

Trying to read a 15M gzipped file (200M uncompressed) shows a strange behaviour, with logstash 7.9.2. I tried reading plain files and it doesnt seem to have a difference.

  1. It takes about a minute to read the file entirely (reading from an EC2 machine on the same region)
  2. Trying locally with python or bash, takes about 5 seconds to read the entire file.

After looking at the debug logs it seems that the download part is really fast, and the local processing is very slow.

[2021-01-12T14:45:50,258][DEBUG][logstash.inputs.s3] Processing {:bucket=>"test", :key=>"logs/sample1.gz"}
[2021-01-12T14:45:50,259][DEBUG][logstash.inputs.s3] Downloading remote file {:remote_key=>"logs/sample1.gz", :local_filename=>"/tmp/logstash/sample1.gz"}
[2021-01-12T14:45:50,572][DEBUG][logstash.inputs.s3] Processing file {:filename=>"/tmp/logstash/sample1.gz"}
[2021-01-12T14:46:40,435][DEBUG][logstash.inputs.s3] Processing {:bucket=>"test", :key=>"logs/sample2.gz"}

I tried looking into the source code of the plugin, removing some of the code to be able to pinpoint the problem. Eventually I removed almost every line in the process_local_log function, keeping only the codec (which is plain by default) decoding, and the queue << event line. It seems that this is the line that is taking most of the time.

Any idea what could be the cause of this? This makes the plugin almost unusable in use cases with large volumes for example flow logs forwarding

kaisecheng commented 3 years ago

Could you share your pipeline config? What is the output plugin? Do you see the same problem in 7.12?

yogevyuval commented 3 years ago

Could you share your pipeline config? What is the output plugin? Do you see the same problem in 7.12?

The output plugin was file output plugin, and we havent tried it in 7.12. Unfortunately I don't have the exact pipeline but it was very simple (read from s3, write to file)