splunk / eventgen

Splunk Event Generator: Eventgen
Apache License 2.0
376 stars 180 forks source link

SA-Eventgen creates events in future[BUG] #338

Closed DieterSchmitz closed 4 years ago

DieterSchmitz commented 4 years ago

Describe the bug We're using version 6.3.3 of SA-Eventgen for generating events in Splunk. For that, we've created several CSV files which contains the data. They contain data for approx. 10 Minutes. The "end" option within the eventgen.conf file is set to 2, so data for around 20 minutes are to be created.

After Splunk restarts, events are created successfully. But after taking a deeper look at the events, it seems, that events were created in the future. Eventgen finishes its work after 1 minute and during this period the whole data is generated (the whole 20 minutes). Generally, this should not be a problem, but we want to run eventgen continuously in our test lab and this behavior bloats our license quickly.

How can we prevent SA-Eventgen to create events in the future?

To Reproduce Steps to reproduce the behavior:

  1. create simple csv file with e.g. 5000 lines
  2. The timestamp of the last event should be 20 minutes after the first event (this makes it easier to see that events were created in the future)
  3. Use the sample file (see below)

Expected behavior Events should be created in realtime and not in the future. This prevents us for running our eventgen infinitivly.

Actual behavior Events were created as fast as possible so most of the generated events were created in the future.

Sample files and eventgen.conf file [csvfile.csv] mode = replay sampletype = csv outputMode = splunkstream end = 2

token.0.token = \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3} token.0.replacementType = replaytimestamp token.0.replacement = %Y-%m-%d %H:%M:%S.%f

Do you run eventgen with SA-eventgen? Yes

If you are using SA-Eventgen with Splunk (please complete the following information):

li-wu commented 4 years ago

Could you use version 6.5.2 to have a try if you still have this issue?

DieterSchmitz commented 4 years ago

Same behavior in version 6.5.2

li-wu commented 4 years ago

@DieterSchmitz could you attach you sample file csvfile.csv so that I can reproduce your issue on my local env. Thanks.

li-wu commented 4 years ago

This is the documentation for replaytimestamp:

For replaytimestamp, the token will be replaced with the strptime specified in the replacement setting but the time will not be based on earliest and latest, but will instead be replaced by looking at the offset of the timestamp in the current event versus the first event, and then adding that time difference to the timestamp when we started processing the sample. This allows for replaying events with a new timestamp but to look much like the original transaction. Assumes replacement value is the same strptime format as the original token we're replacing, otherwise it will fail. First timestamp will be the value of earliest. NOT TO BE CONFUSED WITH REPLAY MODE. Replay mode replays a whole file with timing to look like the original file. This will allow a single transaction to be replayed with some randomness.

DieterSchmitz commented 4 years ago

csvfile.zip @li-wu Thanks for your help. Attached the sample file. If you want to, you can download the complete Splunk app here: https://bit.ly/37Hx5ci

li-wu commented 4 years ago

@DieterSchmitz Thanks for the quick response. Read above documentation for the replaytimestamp and you need to add earliest = -2h to your conf file since the first timestamp will be the value of earliest. Complete conf should be this:

[csvfile.csv]
mode = replay
sampletype = csv
earliest = -2h
end = 2

token.0.token = \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}
token.0.replacementType = replaytimestamp
token.0.replacement = %Y-%m-%d %H:%M:%S.%f
DieterSchmitz commented 4 years ago

@li-wu Our Eventgen app is used internally and from our partners. The latter start the eventgen app 15 minutes before a presentation is started. In this case it is ok to create the events in the future, as they need sample data for just 1-2 hours. After the presentation no more sample data is generated and required. So everything is fine in this scenario. In our case it is a little bit different. We run our eventgen app in our internal lab continuously. So we cannot set the "earliest" setting. The current behaviour bloats our Splunk license quickly.

li-wu commented 4 years ago

I did not get your point for your internal use case. What you want is generating events continuously but does not bloats the Splunk index quota?

DieterSchmitz commented 4 years ago

The point is that too much data is generated. At the moment the Splunk Eventgen app produces sample data as fast as possible. To produce 2 hours of sample data only 5-10 minutes is needed. This means, if we run the Eventgen app for 24 hours we get sample data for approximately 14 days. That's too much for our internal license. The two hours sample file should take two hours for replay.

li-wu commented 4 years ago

You can add interval = 7200 to the conf supposing it takes 5 minutes to generate 2 hours sample data. (Generate 12 times with interval of 2 hours).

li-wu commented 4 years ago

@DieterSchmitz I am closing this ticket. Feel free to reopen it if you still got problems.

DieterSchmitz commented 4 years ago

@li-wu According to the documentation on github, the interval settings is valid in mode = sample only. We need to replace the timestamps, which is (only???) done in mode = replay.

Can you confirm, that if we start creating events, SA-Eventgen does it as fast as possible and ignores the time difference of events (lines) within the sample file? And if yes, is there a setting to change the behavior?

If there is a time difference of let's say 1.2 seconds between the first and second line in the sample file, SA-Eventgen should wait for 1.2 seconds before sending the sencond line.

li-wu commented 4 years ago

I believe the interval works with mode = replay as well. I test a very simple case in my local env. Maybe we should get the doc fixed. Could you have a try for your situation as well? Thanks. Do not forget to set end = -1 for your test.