serilog-mssql / serilog-sinks-mssqlserver

A Serilog sink that writes events to Microsoft SQL Server and Azure SQL
Apache License 2.0
276 stars 147 forks source link

SqlBulkCopy inefficient space utilization with small batches #509

Closed BrettJaner closed 6 months ago

BrettJaner commented 7 months ago

Hey Serilog Contributors,

I'm having an issue with the table I'm logging to (the LogEvents table). SQL Server is allocating a lot more space for it than it is actually using (in my case it's about 8 times unused vs used). I believe I've found the culprit. Since SQL Server 2016, bulk load context (fast inserts) is turned on by default. This means that if batches are relatively small or not close to the size of a multiple of a SQL Server extent (64KB), space will be allocated to the LogEvents table but not used.

SQL Server 2016, Minimal logging and Impact of the Batchsize in bulk load operations

The documentation says you can use a trace flag 692 to turn off the bulk load context but doesn't look like this is available in Azure SQL DB. I've looked at tuning the batch size, but without extending the batch period to be too long, I can't get the results I'm seeking. The only option I have is to rebuild or reorganize the clustered index of the LogEvents table every so often to reclaim the unused space. However, this solution is only viable when I have access to the application's environment and infrastructure, which is not always the case for my application in production.

I've seen a previously closed issue (#132) talking about alternatives to SQLBulkCopy. Maybe it is time we revisit that conversation. I'm going to fork this repo, make some edits to use the INSERT statement instead of SQLBulkCopy, and run it in my test environment to see how that goes.

Thanks, Brett

BrettJaner commented 7 months ago

Here's what I'm doing in my fork (86cec67)

  1. Add property called UseSqlBulkCopy to MSSqlServerSinkOptions defaulting to true
  2. Add a SqlInsertBatchWriter which uses INSERT statements (highly based off of the SqlLogEventWriter used by the MSSqlServerAuditSink)
jonorossi commented 7 months ago

I believe I've found the culprit. Since SQL Server 2016, bulk load context (fast inserts) is turned on by default. This means that if batches are relatively small or not close to the size of a multiple of a SQL Server extent (64KB), space will be allocated to the LogEvents table but not used.

That is definitely the cause.

We've worked around this issue from SQL Server 2016 for heaps of log tables where we insert small amounts of data, especially on very frequent fixed intervals. It just happens that our Serilog logs table is mostly quiet so doesn't grow like other tables and we've got table maintenance processes set up which sorts it out.

I haven't looked at your code changes in detail, but it would be a worthwhile pull request.

BrettJaner commented 7 months ago

@jonorossi Appreciate the confirmation and related anecdote. Pull request is open.

ckadluba commented 7 months ago

This really sounds like a good improvement. Thank you @BrettJaner for researching and creating the PR.

If I understand this correctly, the behaviour of the batched sink should not change with this for users of SQL Server 2016 or higher even if we make this changed implementation the default. Is this assumption correct?

@jonorossi what do you think? Should we give this a new major version (e.g. 7.0.0) or is 6.6.0 sufficient since it is not even a breaking change for most users?

BrettJaner commented 7 months ago

If I understand this correctly, the behaviour of the batched sink should not change with this for users of SQL Server 2016 or higher even if we make this changed implementation the default. Is this assumption correct?

If we make the new INSERT statement batch writer the default (UseSqlBulkCopy = false), we'd lose out on the benefits of SqlBulkCopy which would probably be felt by consumers with high log volume scenarios. My gut tells to release the new INSERT statement batch writer in a non-breaking fashion, requiring a consumer to opt-in to it's functionality, and see how it's adopted. We could always make it the default implementation in a later (breaking) release.

jonorossi commented 7 months ago

@ckadluba A 6.x version is fine since it's not a breaking change, just a new feature. I agree with @BrettJaner, we could always make it the default in a future major release. We could also consider the size of the batch (or batches) being written and dynamically pick one or the other.

ckadluba commented 7 months ago

Okay, so the new insert batch writer is not on by default in your PR. I did not look at your changes in detail yet and from the discussion I had the misunderstanding that it was defaulting to the changed behaviour. That's why I was asking about the major version increase. But if it is opt-in, I think it is safe to add it. Thanks for clarification.

@jonorossi the dynamic mode you described sounds quite interesting. It might improve the storage and runtime behaviour for workloads with varying batch sizes. But to approach this in small digestable steps we could add the dynamic batch mode in another PR later and in a new major release it could become the default. What do you think? I created a 7.0.0 milestone and documented this in the description (https://github.com/serilog-mssql/serilog-sinks-mssqlserver/milestone/2).

jonorossi commented 7 months ago

But to approach this in small digestable steps we could add the dynamic batch mode in another PR later and in a new major release it could become the default. What do you think? I created a 7.0.0 milestone and documented this in the description

@ckadluba I completely agree. Get this one sorted and released so Brett is unblocked. I don't even know what the threshold would be for a dynamic selection, we'd have to do some performance and storage testing.

ckadluba commented 6 months ago

@BrettJaner Thank you very much for this great PR! I have updated the version of the library to 6.6.0 and a prerelease version was created on nuget.org. https://www.nuget.org/packages/Serilog.Sinks.MSSqlServer/6.6.0-dev-00072

Could you please try it out? If we find no problems with it, we're good to release a regular 6.6.0 version.

BrettJaner commented 6 months ago

@BrettJaner Thank you very much for this great PR! I have updated the version of the library to 6.6.0 and a prerelease version was created on nuget.org. https://www.nuget.org/packages/Serilog.Sinks.MSSqlServer/6.6.0-dev-00072

Could you please try it out? If we find no problems with it, we're good to release a regular 6.6.0 version.

You're welcome, @ckadluba! Will get the pre-released version into our test environment and will report back after we have some mileage on it, maybe a week or so.

BrettJaner commented 6 months ago

Hi @ckadluba, pre-release version looks good in our test environment. I'd say you are safe to release from my perspective. Thank you (and @jonorossi too) for maintaining this sink and moving this issue along!

ckadluba commented 6 months ago

@BrettJaner very nice to hear that.

Thak you for testing the perelease and for the great contribution. We appreciate this a lot. :)

I just create release 6.6.0 by merging dev into main. Closing this issue now.