janmg / logstash-input-azure_blob_storage

This is a plugin for Logstash to fetch files from Azure Storage Accounts
Other
29 stars 8 forks source link

Split the long json log files #35

Open sree0744 opened 1 year ago

sree0744 commented 1 year ago

Hello,

Tried azure_blob_storage input plugin with "json" codec and our requirement is to split the long json file into small units and then to process. Was looking for any options like "break_json_down_policy" in this plugin. Currently we are getting the results as shown below.

Event#1 {

    "Computer": "*",   "LogEntrySource": "stderr", *** o/p *** omitted. }

{

    "Computer": "*", "LogEntrySource": "stderr", *** o/p *** omitted, }

What we are trying to achieve is as shown below

Event#1 {

    "Computer": "*",   "LogEntrySource": "stderr", *** o/p *** omitted. }

Event #2 {

    "Computer": "*", "LogEntrySource": "stderr", *** o/p *** omitted, }

janmg commented 1 year ago

This plugin is not azureblob. That plugin had the break_json_down_policy, but they obsoleted the config. https://github.com/Azure/azure-diagnostics-tools/blob/eaeed2866899169016786f0feaab3cd996a3b55d/Logstash/logstash-input-azureblob/lib/logstash/inputs/azureblob.rb#L113

This plugin is an input plugin that can read one json file and process it as a whole. If the logtype is NSG Flowlogs this plugin can deal with grow json file by inserting a block which means that the head and tail need to be respected as a part of json parsing.

For other logtypes the type has to be set to raw and the codec to json or json_lines and the whole file is read as a single json event and handed off to the filter block, where you can split the file into smaller events.

If you have to implement something special, because of your file format you can integrate it around line 275 where the logtypes are checked. https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb#L275

sree0744 commented 1 year ago

Thanks @janmg for the help on this. As like azureblob, we were looking on logstash-input-azure_blob_storage to split the input file. Anyways we shall look at the filter plugin to split the file into smaller events.

Thanks