janmg / logstash-input-azure_blob_storage

This is a plugin for Logstash to fetch files from Azure Storage Accounts
Other
30 stars 9 forks source link

Not all events are parsed #24

Closed luigiromano24 closed 2 years ago

luigiromano24 commented 2 years ago

Hi, nice to meet you and congratulations for the plugin, it's very cool!

I'm reaching you out as I've put in place a Logstash pipeline with your plugin to ship logs from SAP Commerce Cloud (Azure Blob Storage) to a custom ElasticSearch. Here's my pipeline: `input { azure_blob_storage { storageaccount => "" access_key => "" container => "commerce-logging" interval => 300 } }

output { elasticsearch { hosts => ["HOST"] index => "test-index" user => "ES_USER" password => "ES_USER" } stdout { codec => rubydebug } }`

From logs I can see that only few logs are processed (and then sent to my custom ES): Here's what I see from logs: processed 1 events, saving 452 blobs and offsets to remote registry data/registry.dat

I've put in place a debug mode for this issue and I see below: ... debug_timer => true debug_until => 100 ...

I'm attaching the logs.

I hope you can help me in this!

Kind Regards, Luigi

logstash_issue_azure_blob_storage.log

janmg commented 2 years ago

when the plugin starts the first time, it will do a directory listing to remember all the files in a registry file. The default is to resume and not process the files it already has seen. So if nothing previously got processed, it will start fresh. Since the plugin already ran, you can either remove the registry to start_over, or set the policy to start_over.

The default input format is assumed to be JSON. The plugin will try to learn the json format of the input so that it can ignore the start and end. When JSON files grow the new objects can be parsed. You get 2 errors while trying to parse a file as JSON. This probably means that your files are mostly line based, you can add this to your codec => line The one event that got processed must have contained just enough JSON for the parser to make sense.

Originally I wrote the plugin to deal with NSG Flowlogs, but any other azure file should work too. The plugin can only work with one type of logfile at a time, so if you have multiple filetypes, it requires some configuration with file filters to separate the pipelines and avoid writing to the same registry.

If it still doesn't work with these tips, paste some lines of a log file, so I can recreate your situation.

luigiromano24 commented 2 years ago

Hi Jan, thank you for your input - very useful. codec => line helped me. For sure it will send JSON logs as simple String but I can implement a custom Ruby script to parse the JSON before sending it on ElasticSearch so that on Kibana I can filter the results by single JSON keys.

I've also tested with codec => json but no luck - seems logs are not parsed.

Best Regards!

janmg commented 2 years ago

Logstash also a codec for json_lines that treats individual lines as json, but using a little ruby script is a good idea, that's how I started my plugin as 5 lines of code to filter events. Eitherway, good luck with your project