serilog-contrib / serilog-sinks-splunk

A Serilog sink that writes to Splunk
https://splunk.com
Apache License 2.0
46 stars 47 forks source link

Just updated from 3.7.0 to 4.0.0/4.0.1 and I cant get splunk to index the data #183

Closed MatthewHays closed 5 months ago

MatthewHays commented 5 months ago

I updated my project to the latest serilog splunk package and now my logs no longer appear in splunk. I'm using Serilog 3.1.1 and I have a test case to log a handful of lines to Splunk that works fine with 3.7.0 but doesn't with 4.0.0 and 4.0.1. I've enabled self logging and get nothing, I've force flushed the log and disposed, and setting the batch size to 1 etc and nothing seems to work anymore. Have I missed something obvious? Our current Splunk backend is v7.3.1

Apologies if this is the wrong place to raise this, happy to redirect my issue elsewhere (StackOverflow?)

Thanks

MatthewHays commented 5 months ago

For what its worth, this is our test case. We've had this for a number of years and it worked fine before

[Test, Explicit("Used for diagnosing Splunk connectivity issues")]
public void WriteToSplunk()
{
    var splunkEndpoint = "http://fx-uat-splunk:8088";
    var splunkToken = "<redacted>";

    var loggerConfig = new LoggerConfiguration();

    Serilog.Debugging.SelfLog.Enable(msg => Debug.WriteLine(msg));

    loggerConfig
        .MinimumLevel.Verbose()
        .WriteTo.EventCollector(splunkEndpoint, splunkToken, restrictedToMinimumLevel: LogEventLevel.Debug);

    using (var logger = loggerConfig.CreateLogger())
    {
        Serilog.Log.Logger = logger;

        logger.Information("Information message {@param}", new { Property1 = 1, Property2 = 2 });
        logger.Warning("Warning message {@param}", "Hello this is a string");
        logger.Error(new Exception("Bang"), "Error message");
    }
    Serilog.Log.CloseAndFlush();
 }
EEParker commented 5 months ago

Thanks for the report, Do you have some more information about your environment, e.g. dotnet target framework, host operating system?

EEParker commented 5 months ago

Additionally, have you tried setting your splunk endpoint to use the full event collector address? e.g. http://splunk:8088/services/collector/event

ref: https://docs.splunk.com/Documentation/SplunkCloud/latest/Data/UsetheHTTPEventCollector#Send_data_to_HTTP_Event_Collector_on_Splunk_Enterprise

MatthewHays commented 5 months ago

Hi, I've just debugged through this, and hacked the new and old code about in VS, and it looks like a bug with the new time logic ToEpoch in 3.7.0 returns a correct time, but in 4.0.1 it appears to generate a time one hour into the future, which is why splunk isnt showing my logs. I'm the UK and currently at UTC+1

image

image

MatthewHays commented 5 months ago

Just to add, value being passed in is image

new Epoch func returns = Friday, 5 April 2024 16:55:48.667 (GMT) old Epoch func returns = Friday, 5 April 2024 15:55:48.667 (GMT)

MatthewHays commented 5 months ago

Converting value to UTC first fixes this, ie

ToSeconds(value.ToUniversalTime().Ticks - Epoch.Ticks);

EEParker commented 5 months ago

@MatthewHays I've updated the code with your suggestions and added some unit tests for this condition. Can you try nuget version 4.0.1-dev-00023?

MatthewHays commented 5 months ago

@EEParker Thanks for the fast turnaround. I'll be on Easter holidays for a bit, but that change looks good,. I'll throw the new nuget package into our code base when I get back.

EEParker commented 5 months ago

I'm going to go ahead and release the 4.0.2 as a patch to fix this time error. If it doesn't resolve your issue, I'll reconnect with you after the weekend.

VictorioBerra commented 5 months ago

@MatthewHays Thank you for sharing that test.

A small tidbit, we just updated our servers to Windows 2022, and in our .NET 4.8 app Splunk logging stopped. Took Wireshark to tell us that the .NET app was trying to use TLS 1.0. We needed to add a registry key to fix it which forces Windows Server to use tls 1.2+ for .NET. Something like this: https://www.seequent.com/how-to-enable-tls-1-2-as-default-in-windows/

I just point this out as even a unit test like the one you have would not have caught that. Maybe someone from Google will find this someday too.

MatthewHays commented 5 months ago

It's not specifically a unit test, its just something we run when something changes on our side related to Splunk (firewalls, IPs, libraries, versions etc) and we want to manually confirm that logs are still being indexed. I just threw this into this code base locally to drive the code in 3.7.0 and 4.0.1 to find the difference in the debugger that caused the prob. We could programmatically query Splunk via the API to check the messages appear or I supposed we could regression test that the exact same payload gets sent on the wire as you expect, by mocking out the httpclient and comparing against a canned result, but will leave that up to you guys as to what you think is best. (And yes, no suite of tests will catch every possible scenario)

EEParker commented 5 months ago

@MatthewHays Thank you for sharing that test.

A small tidbit, we just updated our servers to Windows 2022, and in our .NET 4.8 app Splunk logging stopped. Took Wireshark to tell us that the .NET app was trying to use TLS 1.0. We needed to add a registry key to fix it which forces Windows Server to use tls 1.2+ for .NET. Something like this: https://www.seequent.com/how-to-enable-tls-1-2-as-default-in-windows/

I just point this out as even a unit test like the one you have would not have caught that. Maybe someone from Google will find this someday too.

I just ran into this as well,

Fix for TLS 1.2 on windows servers

reg add HKLM\SOFTWARE\Microsoft\.NETFramework\v4.0.30319 /v SchUseStrongCrypto /t REG_DWORD /d 1

Note: this may have widespread impacts, so be sure you need it.

VictorioBerra commented 5 months ago

@EEParker Good point. For us, we had just provisioned new 2022 servers using an in-place upgrade. We were not sure why our .NET 4.8 app + Serilog Splunk was trying to use TLS 1.0. Wireshark showed us the handshake was failing. Upon setting that registry key everything started flowing.

EEParker commented 5 months ago

I added this to the wiki. There might be a better spot, but I wanted to highlight it. https://github.com/serilog-contrib/serilog-sinks-splunk/wiki#windows-tls-12

EEParker commented 5 months ago

@MatthewHays can you confirm if 4.0.3 and the TLS configuration resolve this issue for you?

https://www.nuget.org/packages/Serilog.Sinks.Splunk/4.0.3

MatthewHays commented 5 months ago

@EEParker I'll be back in the office on the 15th so will check it then, this would break anyone pushing logs from a + GMT timezone (likely why it wasn't spotted in the US, as it will work but just the timestamp will be wrong). Happy for this to be closed, the fix is the same I did locally to get it working.

MatthewHays commented 5 months ago

@EEParker yep working, all good thanks.

EEParker commented 5 months ago

https://github.com/serilog-contrib/serilog-sinks-splunk/wiki#400-breaking-changes added here