splunk / splunk-sdk-csharp-pcl

Splunk's next generation C# SDK
https://dev.splunk.com/enterprise/docs/csharp
Apache License 2.0
64 stars 46 forks source link

ModularInput: Events in the EventWriter queue are lost when Splunk shuts down. #37

Closed brattonc closed 2 years ago

brattonc commented 9 years ago

I ran in to this issue while working on a modular input that processes a large amount of data and subsequently runs for an extended period of time. If Splunkd shuts down while a modular input is running and events are sitting in the EventWriter queue, these events are lost and never logged to Splunk. Additionally, the events are sent to the IProgress< EventWrittenProgressReport > as if they were successfully written. This makes it impossible to create a checkpoint for the file being processed because there's no way to tell how many events actually made it to Splunk.

Steps to reproduce:

  1. Create a modular input that continuously logs events.
  2. Run the modular input using Splunk and let the EventWriter queue fill up with at least 100,000+ events.
  3. Stop the Splunk windows service while the modular input is running.

At this point Splunk will send a ctrl+break signal to the modular input. I have the modular input trap this signal and stop writing events. The EventWriter continues writing events to stdout and sending the Events to the progress reporter, but none of these events reach Splunkd. In fact, events written to stdout for up to 500ms (the amount of time varies from run to run) before the ctrl+break is received are lost as well.

brattonc commented 9 years ago

A fix I'd suggest is...

  1. Have Splunkd send a signal to the modular input via it's stdin requesting that it shutdown.
  2. The modular input framework would then discard everything in the event writer queue (without signaling the progress reporter) and signal back to splunkd that event writing is complete.
  3. And then the modular input framework fires an event so that checkpoint data can be saved.
  4. If the modular input doesn't terminate within 5 seconds, then have splunkd send the ctrl+break signal to it.

This isn't a simple fix as it requires changes to Splunkd, but I think that's unavoidable given the symptoms I've outlined in the issue (losing some events that are written even before the modular input receives the ctrl+break).

itay commented 9 years ago

@brattonc thanks for filing the bug. As you noted, there is a larger issue with the control mechanisms from splunkd, and that's something we will look into but won't be fixed in the immediate future.

Do you think there is a bug in the C# Mod Input framework itself to fix, or is this just a symptom of the larger problem?

brattonc commented 9 years ago

Imo, the EventWriter sending events to the progress reporter when it could know that Splunkd is trying to terminate it (via ctrl+break) is a bug. Trapping this and dumping the event queue wouldn't fix the issue, but would at least mitigate some of the damage. I'd rather lose a couple hundred events than 100k+ as it is right now.

I understand if this isn't a desirable change as it's a stopgap and not a true fix. As an alternative, a mechanism for injecting some other implementation of an EventWriter in to the ModularInput class would let me implement the ctrl+break trap and queue dumping myself.

itay commented 9 years ago

@brattonc we'll look into this and see what we can do. Pull requests as always are welcome.

glennblock commented 9 years ago

Thank you @brattonc. We will definitely look into this. On Wed, Apr 29, 2015 at 11:11 AM Itay Neeman notifications@github.com wrote:

@brattonc https://github.com/brattonc we'll look into this and see what we can do. Pull requests as always are welcome.

— Reply to this email directly or view it on GitHub https://github.com/splunk/splunk-sdk-csharp-pcl/issues/37#issuecomment-97460439 .

ncanumalla-splunk commented 2 years ago

This SDK is deprecated and no longer under active development.