janmg / logstash-input-azure_blob_storage

This is a plugin for Logstash to fetch files from Azure Storage Accounts
Other
29 stars 8 forks source link

Add a filename field in the message #10

Closed Xaaame closed 3 years ago

Xaaame commented 4 years ago

Hello !

I have a question for you : I need to filter my data by the name of my different files, how can I do this ? I see in the TODO list :

show file path in logger

add filepath as part of log message

So I don't think there is a option to solve my problem but maybe I can succeed by an other way, I have already try to grok the path_filters like: grok { match => [ "path_filters", "%{GREEDYDATA:filename}"] } But this is not conclusive

Thx

janmg commented 4 years ago

I don't completely follow your question, but if you want the logstash input plugin to read and monitor only some of the directory or files in your container you can use path_filters, but that is a input configuration. With path_filters it's possible to only process all log and text files in all directories by setting ['*/.log','*/.txt'], it will then skip processing other files.

The configurations parameter path_filters is a direct copy from the original input azureblob plugin that didn't scale well for me so I wrote my version plugin. But I added the path_filters parameters that they have. https://github.com/Azure/azure-diagnostics-tools/tree/master/Logstash/logstash-input-azureblob#optional-parameters

Technically it uses the JRuby File.fnmatch to do the filtering. The examples in there may help you to do complicated filtering. https://ruby-doc.org/core-2.5.5/File.html#method-c-fnmatch-3F

The input plugin reads the files and if codec is set to line, every line is sent internally as json inside a "message" to the filter like this {"message": "line1 found 7 things in file interestingfile.txt"} With grok you can then find take the message field that contains the whole event and map the line into the variables your interested in ...
grok { match => ['message', '%{WORD:line} found %{NUMBER:errors:int} things in file %{WORD:filename}'] } which will result in a json sent to the output plugin that looks like this { "message": { "line": "line1", "errors": 7, "filename": "interestingfile.txt" } }

But I only created the azure_blob_storage input plugin, not the magic of the grok filter

Xaaame commented 4 years ago

My question is: how can I get the name of each processed file, by adding a field in the message for example ? I want to display the data in kibana for each file and not for all the files contained in the blob.

When we use an local input, we do this for example: grok { match => [ "path"; "%{GREEDYDATA:filename}" ] } But there is no "path" variable in input of the plugin so I do not have the means to recover the name of the files contained in the blob.

Thx for your help

janmg commented 4 years ago

Now I understand. the filename is available in a variable 'name'. But the message goes through the decorator without the filename and than into the queue. I don't have much time, but when I do I can try to add an option to put the filename in a meta, than you can have access to the filename in the filterblock

https://github.com/janmg/logstash-input-azure_blob_storage/blob/0003e8af95137aeeba91905d10868f54181b0182/lib/logstash/inputs/azure_blob_storage.rb#L262

janmg commented 4 years ago

So I'll add something like this event.set('filename', name) Just after the decorate

https://www.elastic.co/guide/en/logstash/7.9/input-new-plugin.html

Xaaame commented 4 years ago

Can you write a message in this issue when you create this ?

Thx a lot !

janmg commented 3 years ago

0.11.5 has been pushed it has a new option to set addfilename => true which will add the whole filepath of the processed files.

janmg commented 3 years ago

addfilename => true should do the trick