Closed jmkelly closed 7 years ago
This may be related to https://github.com/npgsql/npgsql/issues/1232. That thread gives some good ideas (and I'll update to a newer version of npgsql) to see if it's fixed
It's super awesome that you are doing this π
I can recall having come across someone complaining that Npgsql had a leak in its connection pool βΒ I have no idea whether it could be related to this issue though
Good to see the tests pass π π
Was it just a matter of increasing the timeout?
PS: it's a little bit slow, but it's not THAT slow....
I receive around 650 msg/s on my machine. IIRC with the SQL Server transport I can receive ~1000 msg/s, so it's definitely in the same league.
Turns out if you don't set the maximum pool size in your connection string, npgsql seems to assume a value that is far too high. I've set to 30 (which is supposed to be the default) and not got problems and the tests now pass.
Maybe I need to tweak the pg settings a little. I'm only getting 324 msg/s on a stock standard pg install on windows, but getting 871 msg/s on Sql Server (both on localhost).
If it's OK with you, I'd like to merge it now
....unless you want to tweak first?
Aaah.... check this out:
readonly AsyncBottleneck _bottleneck = new AsyncBottleneck(Environment.ProcessorCount);
What this means (on my machine) is that only four asynchronous receive operations will be allowed at any point in time. Since the Receive
method does a couple of await
s, the CPUs can be much better utilized by allowing more async operations β e.g.
readonly AsyncBottleneck _bottleneck = new AsyncBottleneck(20);
makes this:
10000 messages received in 20,1 s - that's 498,3 msg/s
10000 messages received in 17,1 s - that's 584,5 msg/s
10000 messages received in 23,4 s - that's 427,9 msg/s
into this:
10000 messages received in 9,3 s - that's 1079,6 msg/s
10000 messages received in 8,9 s - that's 1129,8 msg/s
10000 messages received in 9,5 s - that's 1047,2 msg/s
The AsyncBottleneck
was introduced in the SQL Server transport because otherwise the connection pool would quickly be depleted, resulting in timeouts when trying to get the next connection.
ok, I set the bottleneck the same as SqlServer based on your comment. I thought it would be better set to processor count but that was just a stab early on and I forgot to change it back (this is my first go at async await stuff, so it's been a good learning experience)
What code you getting those perf numbers from? Just want to make sure we are on the same page and talking about the same numbers, mostly because I'm not seeing the speedup you have on the tests I've got.
Happy for you to merge anytime though.
I got the perf numbers simply by running the NizzleName
test case with the specified number of messages...
Thanks. That was the one I was using as well, but I'm still only getting ~400 msg/s. May be something to do with that maximum pool setting, I'll have a play with tomorrow and see if I can get those number up a bit.
I'll leave as is for the moment.
If I knock up the workers on the receiver, I've manage to get a lot more throughput. But I can't seem to explain why sometimes, I get ~250 msg/s per worker, then other times I only get ~50 msg/s per worker. I'd say there is something going on in postgresql in play, could be something to do with the vacuum. With few more workers though, can get over 1000 msg/s total quite comfortably.
Looking at the graphs provided by the BigSql installation, when we go into the receive phase, there are approx 3x the amount of selects per second as apposed to the deletes per second, and this stays constant. The number of deletes matches the messages / s processed, so this may be somewhere to look for performance gains.
Thanks for the contribution π it's out in 2.1.0 now π
Awesome!
This is a crack at getting PostgreSql to handle the rebus transport. As it stands though, I wouldn't be merging this, as I hit a wall when it comes to how PostgreSql / Rebus handles the db connections, so if anyone has some bright ideas, I'll all ears.
As it stands, in the TestPostgreSqlTransportReceivePerformance, if you specify in the test the number of messages above the postgresql
show max_connections
number, it will exhaust the connection pool (but it seems to vary as to when the pool is exhausted). This is traditionally set to 100 in new installs, so I could just set to higher (I believe in the connection string) but I think this is hiding other issues.Reading up on Npgsql, it should return connections to the pool once they are closed, so I'm thinking it is something to do with my connection logic which I largely copied from the SqlServer implementation.
The queue pop is implemented using PG 9.5s skip locked syntax so this won't work < 9.5 but performance should be up there with SqlServer if not better.
Rebus is MIT-licensed. The code submitted in this pull request needs to carry the MIT license too. By leaving this text in, I hereby acknowledge that the code submitted in the pull request has the MIT license and can be merged with the Rebus codebase.