janmg / logstash-input-azure_blob_storage

This is a plugin for Logstash to fetch files from Azure Storage Accounts
Other
30 stars 9 forks source link

Issue reading timestamp #8

Closed sera4000 closed 2 years ago

sera4000 commented 4 years ago

Hi - I have been using your latest plugin version, which has been working very well - up until about a week ago, when the timestamp was no longer being extracted correctly. Here is an example of a log which was working fine until a week ago. The timestamp ('eventTime') is suddenly being output as 'random' months in year 2018:

{"request":[{"id":"|19e7d278-483a3c4282fab47b.","name":"GET Heartbeat/Get","count":1,"responseCode":200,"success":true,"url":"https://api.sanitised.com/api/heartbeat","urlData":{"base":"/api/heartbeat","host":"api.sanitised.com","hashTag":"","protocol":"https"},"durationMetric":{"value":3030.0,"count":1.0,"min":3030.0,"max":3030.0,"stdDev":0.0,"sampledValue":3030.0}}],"internal":{"data":{"id":"4ebe7c0f-cc3d-11fa-b4d5-71de291dc9ce","documentVersion":"1.61"}},"context":{"application":{"version":"1.0.0.0"},"data":{"eventTime":"2020-07-20T03:45:03.8916679Z","isSynthetic":false,"samplingRate":100.0},"cloud":{},"device":{"type":"PC","roleName":"api","roleInstance":"RE00145D02761B","screenResolution":{}},"user":{"anonId":"anonym","authId":"anonym","isAuthenticated":true},"session":{"isFirst":false},"operation":{"id":"18e4f228-483c4e4228feb54a","parentId":"18e4f228-483c4e4228feb54a","name":"GET Heartbeat/Get"},"location":{"clientip":"0.0.0.0","continent":"Europe","country":"United Kingdom"},"custom":{"dimensions":[{"httpMethod":"GET"},{"AspNetCoreEnvironment":"Production"}]}}}

My logstash input is like this:

input {
    azure_blob_storage {
        storageaccount => "somestorageaccount"
        access_key => "someaccesskey"
        container => "somecontainer"
        prefix => "live-serve-customer_dfad752da2e543c7bdf0b7474ddc7a34/Requests/"
        codec => "json_lines"
        registry_create_policy => "resume"
        interval => 3600
        debug_timer => true
        registry_local_path => '/usr/share/logstash/plugins'
        type => "cust-requests"
    }
}

Do you have any idea why the date might have stopped being processed correctly?

Thanks, Sera

janmg commented 4 years ago

My plugin doesn't do date parsing other then mapping fields for flowlogs. Date parsing happens in the filter block, your event time contains microseconds and I don't remember them to be part of the ISO date format, you probably need a custom parser to map all the fields. You can create a test pipeline that creates the single event you captured, test your date parser and use debug to print out the result.

sera4000 commented 4 years ago

Took me a while to figure it out, but what actually happened is that the 'resume' started again from the beginning of my (huge) container. The BLOB files in my container start in Feb 2018. :)

Not sure why it would have suddenly stopped resuming from the (local) registry file and started from the beginning. Maybe a corrupted registry file (200MB)?

I have deleted the registry file, set to 'start fresh', and restarted logstash.

sera4000 commented 4 years ago

Just to let you know, after wiping the registry file and starting fresh, it's all working again. I do wonder if maybe java crashed. 'top' shows java is using max 89% CPU. Even though I have 16GB total mem, I have set

-Xms12g -Xmx12g

By the way - your plugin is amazing. Thanks again!

janmg commented 2 years ago

Assuming this had to do with a file not being read from the azure store, sometimes the underlaying azure ruby fails to read and throws an exception, later versions of my plugin retry these files