joelw / event_hubs_for_splunk

An Event Hubs modular input for Splunk
MIT License
0 stars 1 forks source link

Duplicate events #1

Open IanMoroney opened 6 years ago

IanMoroney commented 6 years ago

We're seeing multiple duplicate events being indexed inside splunk when using this connector. 251 unique timestamps in an hour, but 200,000 events indexed. Have you had issues with duplicate events? we're firing the pull every 60 seconds.

joelw commented 6 years ago

Hi! I haven’t noticed this problem myself, but I could imagine that this might happen if the Input fails to create the epoch file after it finishes. In this case it will always download all saved events. Can you check if the epoch file is being created properly, and if there are any exceptions logged in splunkd.log? Unfortunately there isn’t any error handling around the epoch file, but this would be a good thing to add. If you’re not sure where to look, let me know and I’ll have a dig through the code.

IanMoroney commented 6 years ago

Aha interesting! Is there any information on that file? Like where it should exist or what the filename is? On Thu, 21 Jun 2018 at 21:01, Joel Williams notifications@github.com wrote:

Hi! I haven’t noticed this problem myself, but I could imagine that this might happen if the Input fails to create the epoch file after it finishes. In this case it will always download all saved events. Can you check if the epoch file is being created properly, and if there are any exceptions logged in splunkd.log? Unfortunately there isn’t any error handling around the epoch file, but this would be a good thing to add. If you’re not sure where to look, let me know and I’ll have a dig through the code.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/joelw/event_hubs_for_splunk/issues/1#issuecomment-399224933, or mute the thread https://github.com/notifications/unsubscribe-auth/AKSpXNTKvfYjCeX8hK7hwHeooHekUxZlks5t-_uMgaJpZM4UtNhP .

joelw commented 6 years ago

I was worried you’d ask that! I’m traveling and don’t have access to Splunk on my computer. It will have a .txt extension and the name will be based on your input name. If your input has some non-ASCII characters it’s possible that it will cause problems, so if you can’t see any .txt files anywhere under the Splunk directory hierarchy, maybe try creating an input called “test” or similar?

IanMoroney commented 6 years ago

Perfect I'll give that a try! Thank you On Sat, 23 Jun 2018 at 11:43, Joel Williams notifications@github.com wrote:

I was worried you’d ask that! I’m traveling and don’t have access to Splunk on my computer. It will have a .txt extension and the name will be based on your input name. If your input has some non-ASCII characters it’s possible that it will cause problems, so if you can’t see any .txt files anywhere under the Splunk directory hierarchy, maybe try creating an input called “test” or similar?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/joelw/event_hubs_for_splunk/issues/1#issuecomment-399664373, or mute the thread https://github.com/notifications/unsubscribe-auth/AKSpXBtXwERrB2UHibpLIABrFLVz6XSXks5t_hvYgaJpZM4UtNhP .

IanMoroney commented 6 years ago

Ok, so I found two files in: /opt/splunk/var/lib/splunk/modinputs/event_hubs

called: nameofeventhub_defaultSourceType.txt nameofeventhub.txt

Both files contained just: {}

No other data inside the txt file.

File permissions as follows:

-rw-------. 1 root root 2 Jun 4 17:33 nameofeventhub_defaultSourceType.txt -rw-------. 1 splunk splunk 2 Jun 19 10:42 nameofeventhub.txt

Splunk is running as the splunk user, not as root.

change owner and group of the first file and try again?

IanMoroney commented 6 years ago

I can see the timestamp for nameofeventhub.txt being updated, but it still only contains {}

IanMoroney commented 6 years ago

changed the file owner, and chmod permissions to 777 for testing.

IanMoroney commented 6 years ago

Warnings generated in splunkd.log:

06-19-2018 09:40:41.337 +0100 WARN LineBreakingProcessor - Truncating line because limit of 10000 bytes has been exceeded with a line length >= 17999 - data_source="event_hubs://nameofeventhub", data_host="nameofeventhub.servicebus.windows.net", data_sourcetype="EventHub"

joelw commented 6 years ago

Interesting! It sounds like the permissions are fine, but the input is crashing before running to completion, so it does not update the offsets. Even though the truncation warning is just a warning, perhaps the JavaScript SDK is raising an exception? Could you please try making the following change to the default_message_handler.js (or your own custom message handler)?

From:

       var _               = require('underscore');

       ...

       eventWriter.writeEvent(event);

To:

       var _               = require('underscore');
       var Logger     = ModularInputs.Logger;

       ...

       try {
             eventWriter.writeEvent(event);
       } catch (err) {
             Logger.warn("Unable to process message: " + err.message);
             Logger.warn("Message was: " + message.body);
       }

It should automatically pick up the change the next time the input runs, or you can kill the appropriate Node.JS process or restart Splunk to be extra sure.

I haven't tested this, but it should catch any exceptions raised by the writeEvent call, log the error message and the offending message's body, and then hopefully continue.

IanMoroney commented 6 years ago

event_hubs.zip So it looks like the event_hubs.js is different than above, as eventWriter.writeEvent(event); does not exist. I've uploaded it so you can take a look. I can see eventWriter in two locations but they are wrapped up in other things so i wasn't sure if i should add anything to them

IanMoroney commented 6 years ago

I have however added it to default_message_handler.js as above so we'll see if this logs anything further.

joelw commented 6 years ago

Hi Ian, that's the one - default_message_handler.js is the one which actually does the writeEvent(). The idea is that you can write your own message handler if you need to do any transformation of the data on the way through. In the application I wrote this for, each Event Hub message contains telemetry for many different data types, and I have a special handler which splits each of these into separate {_time=? device=x signal=y value=z} events, which makes normal Splunking much easier. However if you just want the whole event to go into the index as-is, you can use default_message_handler.

Fingers crossed! Let me know how it goes.

IanMoroney commented 6 years ago

Left it running overnight, it blew our license :\

The only thing logged from tailing the txt file was:

{}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}

joelw commented 6 years ago

Bummer! Was anything logged into splunkd.log?

On 3 Jul 2018, at 6:20 pm, Ian Moroney notifications@github.com wrote:

Left it running overnight, it blew our license :\

The only thing logged from tailing the txt file was:

{}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/joelw/event_hubs_for_splunk/issues/1#issuecomment-402072238, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAfY7edpYtQQ67IkAWPoBVbg10IVnoYks5uCzd0gaJpZM4UtNhP.

IanMoroney commented 6 years ago

Unfortunately not :\

joelw commented 6 years ago

I'm running out of ideas! Unfortunately I haven't been able to reproduce the issue, so depending on the nature/confidentiality of your data, if you don't mind giving me temporary access to your Event Hub I'd be happy to try it on my computer, come up with a fix, and delete any data of yours which I ingest. Feel free to email me at joel@joelw.id.au .

IanMoroney commented 6 years ago

no problem , cheers! will test another couple of things before bothering you some more :)

IanMoroney commented 6 years ago

Nope :( just completion messages I believe On Tue, 3 Jul 2018 at 11:29, Joel Williams notifications@github.com wrote:

Bummer! Was anything logged into splunkd.log?

On 3 Jul 2018, at 6:20 pm, Ian Moroney notifications@github.com wrote:

Left it running overnight, it blew our license :\

The only thing logged from tailing the txt file was:

{}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}tail: nameofeventhub.txt: file truncated {}

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/joelw/event_hubs_for_splunk/issues/1#issuecomment-402072238>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAAfY7edpYtQQ67IkAWPoBVbg10IVnoYks5uCzd0gaJpZM4UtNhP .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/joelw/event_hubs_for_splunk/issues/1#issuecomment-402098706, or mute the thread https://github.com/notifications/unsubscribe-auth/AKSpXPuhC6pbiVFgBrygLbcsSEig3BXIks5uC0eUgaJpZM4UtNhP .

joelw commented 6 years ago

I've just tried adding a new Event Hubs input, and discovered that there was a terrible bug which was causing offsets to not be saved, which sounds exactly like the problem you were having! This should now be fixed.

In the default_message_handler, the offset was being deleted from the message (so that we don't bother saving the message in Splunk) but this meant that the offset could not be used. I didn't run into this problem earlier because I use a custom event handler, which doesn't do the same thing. The input now saves the offset in a variable before passing the message to the message handler.

IanMoroney commented 6 years ago

Excellent! Thanks very much On Fri, 17 Aug 2018 at 06:37, Joel Williams notifications@github.com wrote:

I've just tried adding a new Event Hubs input, and discovered that there was a terrible bug which was causing offsets to not be saved, which sounds exactly like the problem you were having! This should now be fixed.

In the default_message_handler, the offset was being deleted from the message (so that we don't bother saving the message in Splunk) but this meant that the offset could not be used. I didn't run into this problem earlier because I use a custom event handler, which doesn't do the same thing. The input now saves the offset in a variable before passing the message to the message handler.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/joelw/event_hubs_for_splunk/issues/1#issuecomment-413762755, or mute the thread https://github.com/notifications/unsubscribe-auth/AKSpXLVqpbguzB5euBYqI1w6E6mVkiO2ks5uRlaMgaJpZM4UtNhP .