logstash-plugins / logstash-output-s3

Apache License 2.0
58 stars 151 forks source link

Added `filename` as a configuration option. #215

Open barqshasbite opened 4 years ago

barqshasbite commented 4 years ago

This new option can be used to specify how files are named when uploaded to S3. It supports logstash string interpolation (sprintf) so can be used to generate unique filenames based on the data from an event.

Addresses #134

Forbzy commented 4 years ago

@barqshasbite I was looking for this functionality. I'm pleased to see someone was working on it. It would be very useful to be able to set file names. Can you say what the current conflict issues are?

BillYoungman commented 4 years ago

I noticed that the last commit for this feature failed on December 19, 2019 has anything been attempted since then. From what I'm seeing I'm not the only one who is interested in this feature.

Thanks, Bill

barqshasbite commented 4 years ago

If I remember correctly, the build initially failed for reasons not related to my change (other builds were also failing at the time).

Since then, I have not revisited this change. In its current state, it is working for my limited use case so I did not take it any further. My use-case being uploading single, write-once, JSON files to S3 with a unique name.

I put up the pull request knowing that other people were interested in the feature and may want to use my variation. It is incomplete, though. In that it does not support the size_file and time_file configuration options for rolling over filenames with a partN filename component. It will always upload with the same filename, so could potentially overwrite existing files if you do not have unique naming setup. Adding support for the partN filename component would round this out and make it a more complete feature.

BillYoungman commented 4 years ago

Our company processes and calculates Sales & Consumer Use Taxes for our clients so on average we get about 10,000 transactions per second that are being moved into Elasticsearch indexes via Filebeat but in addition to this we are moving these transactions in AWS s3 buckets for use in calculating client metric data. The default s3 naming convention of 'ls.s3.xxx' was making it difficult to work with those files hence our need for custom naming so I was really glad when I came across this feature request.

I took your code and after creating a new version of the plug-in in our development environment copied it into that new local version modifying my logstash.yml file to point to this local plug-in. I then made the following modifications to the temporary_file_factory.rb file:

# name = filename == "" ? generate_name : filename name = filename == "" ? generate_name : generate_custom_name(filename)

Created new method: ` def generate_custom_name(filename) filename = "#{filename}.#{SecureRandom.uuid}.#{current_time}"

      if tags.size > 0
        "#{filename}.tag_#{tags.join('.')}.part#{counter}.#{extension}"
      else
        "#{filename}.part#{counter}.#{extension}"
      end
    end

` I ran it through extensive testing using JMeter and for us it is working like a charm although I have noticed a couple of things and they are for both the original version and my version of the plug-in neither one are honoring the size_file tag but we think that it is more a case that data is coming in so fast that by the time the rotate / upload is triggered the file size is larger than what is set in the logstash.conf both are honoring the time_file value.

I apologize for the verbosity of this post but I wanted to put some context around what I did and what we saw.

Forbzy commented 4 years ago

It good to hear work on this feature is continuing. @BillYoungman is your version of the plugin still up to date with new changes to the Master branch? For my use case I would need the time_file option to work because I'd need to output data hourly and daily. Does this option work for your version @BillYoungman ?

BillYoungman commented 4 years ago

@Forbzy it is still in my local file system as this is my first attempt at doing any work like this and wasn't entirely sure of the process involved to officially work in here so didn't want to do anything that might be unauthorized. That being said I do have awhile back I did sign the letter to become a contributing developer.

Let me do some more testing focusing on the time_file property in particular and will post my findings.

BillYoungman commented 4 years ago

@barqshasbite not sure if this is the correct place for questions if it isn't please direct me to the right place - thanks. Adding @Forbzy

But here goes-- time_size is working but upon closer testing of just the file_size tag (I was using size_and_time in my earlier testing which was actually masking the size variable) it is not working for the call to generate_custom_name(filename) - it never rolls the files over however when I let the plugin use the default naming method call generate_name it's working fine.

When I run my tests with the default method call I see calls to

if @rotation.rotate?(temp_file)
            @logger.info("Rotate file",
                       :strategy => @rotation.class.name,
                       :key => temp_file.key,
                       :path => temp_file.path)

            upload_file(temp_file)
            factory.rotate!
        end

When I set a custom filename this is not being called at all. So my question is where / how is the default pattern call working but the custom pattern isn't.

Been struggling with this all day and it isn't helping that this is my first foray into Ruby as well.

Thanks, Bill

Forbzy commented 3 years ago

@BillYoungman Is there no way of just allow us to set the filename our selves, for instance you can do this with the file output plugin when setting the path.

yogevyuval commented 3 years ago

@Forbzy @BillYoungman @barqshasbite This feature is very much needed, the plugin can be useless without it for many use cases as can be seen in the different threads.

Any update about this PR? Is there something that can be done to make this happen? Happy to help if needed

webminster commented 3 years ago

I'd like to upvote this as well... I'm trying to use Logz.io with its S3 bucket shipper, and it wants S3 file names to be in ascending sort order. With the random name, I can't honor that requirement.

yjagdale commented 1 year ago

@barqshasbite - looks like this plugin is no longer maintained. Can you please share a gem file that people can install manually and use?