npgsql / npgsql

Npgsql is the .NET data provider for PostgreSQL.
http://www.npgsql.org
PostgreSQL License
3.32k stars 820 forks source link

Only one usage of each socket address (protocol/network address/port) is normally permitted #1386

Closed kurtschelfthout closed 7 years ago

kurtschelfthout commented 7 years ago

Steps to reproduce

It looks like if pooling is turned off in the connection string, and a process tries to make a lot of sequential calls, an exception is thrown on connection.Open().

Connection string, like: Host={host};Port=6543; Username={username};Password={password};Database={db}; Pooling=false

This connects to pgbouncer in session pooling mode, postgres on the same host as pgbouncer, but client on a different host.

Then execute a bunch of queries one after the other. Not sure if it matters but the queries use a temp table and a transaction:

BEGIN;
CREATE TEMP TABLE temp
ON COMMIT DROP AS
SELECT *
FROM bar
WHERE {where};

SELECT *
FROM baz t
WHERE {morewhere}
  AND t.col = ANY(SELECT e.col FROM temp e);

SELECT * FROM temp t;
COMMIT;

The issue

In this scenario intermittently the following exception is thrown:

System.Net.Sockets.SocketException (0x80004005): Only one usage of each socket address (protocol/network address/port) is normally permitted
   at Npgsql.NpgsqlConnector.Connect(NpgsqlTimeout timeout)
   at Npgsql.NpgsqlConnector.RawOpen(NpgsqlTimeout timeout)
   at Npgsql.NpgsqlConnector.Open(NpgsqlTimeout timeout)
   at Npgsql.NpgsqlConnection.OpenInternal()

Further technical details

Npgsql version: 3.1.9 PostgreSQL version: 9.6.1 PgBouncer version: 1.7.2 Operating system: Windows

roji commented 7 years ago

@kurtschelfthout, sorry for taking so long to get to this.

You may be running into the issue described here, by opening and closing so many sockets you may be exhausting the ~4000 ports available by default.

Can you please try the workaround described in this link (increasing MaxUserPort to 65534 and restarting)? If the issue goes away then we have a confirmation, and I'll look into implementing a workaround inside Npgsql itself.

roji commented 7 years ago

A much more in-depth article describing the issue: http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html

Assuming that this is your problem, I think the right way to deal with it is to increase MaxUserPort to allow for more ephemeral ports (e.g. 65534), I don't think Npgsql should really attempt to deal with this internally. But I'll wait for your confirmation.

kurtschelfthout commented 7 years ago

Yes, that does seem like the likely cause, thank you. Instead of the suggested solution, I have opted to use pooling but aggressively tune the pool and lower the time until connections get removed from the pool. This (hopefully - in test) strikes a balance - reducing connection setup time and avoiding the problem above, while still closing connections quicker to give pgbouncer enough flexibility.

roji commented 7 years ago

@kurtschelfthout you're right, pooling really is the right solution to this problem.

May I ask for the reason for using pgbouncer in the first place, rather than Npgsql's connection pool? Are you using pgbouncer to pool centrally for multiple client servers?

kurtschelfthout commented 7 years ago

Yes, we are accessing a relatively low number of postgres servers (say a handful) from 1000s of machines on our compute grid. Each grid machine typically only needs a connection to postgres for a short time (say a minute or so) for a given task, and some may not need it at all, but some also do many queries in a short time (which should really be optimized, but other priorities...). So we need to strike a bit of a balance between allowing pgbouncer to do its work (i.e. reusing connections among many clients ) and doing many queries from a single client without paying the setup/teardown cost each time.

roji commented 7 years ago

OK, thanks! That's definitely one scenario where pgbouncer makes sense.

roji commented 7 years ago

And your solution of turning on Npgsql pooling but with aggressive pruning is definitely the way to go - for cases where a grid machine does many open/close in a very short time you'll have a performance boost in addition to getting rid of the error.

omeraplak commented 7 years ago

Hello, I have a same issue. If i close the npgsql pooling i get a same error. I don't understand because "http://www.npgsql.org/doc/compatibility.html" say "close the pooling". Can you explain more this issue?

@roji @kurtschelfthout Thanks.

roji commented 7 years ago

@omeraplak I'm not sure I understand the issue... Are you using Npgsql without pooling (Pooling=false in the connection string), and are getting the same error (Only one usage of each socket address (protocol/network address/port) is normally permitted)? If so, read above on why this is happening: Windows local port exhaustion which has nothing really to do with Npgsql.

In general, pooling shouldn't be turned off unless you have a good reason and understand all the consequences. Feel free to provide more details and I'll try to help.

omeraplak commented 7 years ago

Hey @roji, thank you for reply. My setup is, .NET Web API Async Npgsql --> pgBouncer (on linux) --> AWS RDS Postgresql

I select "session" mode on pgbouncer and i set pooling=false in api application and pgbouncer eating all ports minutes later.

File descriptor limit: 1024 (H:4096), max_client_conn: 1000, max fds possible: 1010
listening on 0.0.0.0:6432
listening on ::/6432
listening on unix:/var/run/postgresql/.s.PGSQL.6432
process up: pgbouncer 1.7.2, libevent 2.0.21-stable (epoll), adns: evdns2, tls: OpenSSL 1.0.2g  1 Mar 2016
C-0xe73980: (nodb)/(nouser)@10.0.0.191:60128 registered new auto-database: db = dbname
C-0xe73980: dbname/postgres@10.0.0.191:60128 login attempt: db=dbname user=postgres tls=no
S-0xe786c0: dbname/dbname@10.0.0.52:5432 new connection to server (from 10.0.0.198:51310)
C-0xe73980: dbname/postgres@10.0.0.191:60128 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60129 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60129 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60130 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60130 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60131 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60131 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60132 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60132 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60133 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60133 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60134 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60134 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60138 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60138 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60140 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60140 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60141 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60141 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60142 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60142 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60143 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60143 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60144 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60144 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60145 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60145 closing because: client close request (age=0)
C-0xe73980: dbname/postgres@10.0.0.191:60146 login attempt: db=dbname user=postgres tls=no
C-0xe73980: dbname/postgres@10.0.0.191:60146 closing because: client close request (age=0)`

You look 10.0.0.191 using ports. ...33,...34,...38 ....... ...46, then i haven't availavable socket and throw the error. Why pgbouncer not re-use again ports? All time open new ports. I didn't understand. If you want i can put my pgbouncer config.

Thanks

roji commented 7 years ago

@omeraplak, can you post the error you're seeing? Is it on the Npgsql side or on the pgbouncer side?

The ports you're seeing (...33, ...34) are the ephemeral TCP ports for each new incoming connection from Npgsql (and not outgoing connections to PostgreSQL). It seems normal for pgbouncer to get a new ephemeral port from the OS for each client, this only becomes a problem if too ports connections are open/closed too quickly, as described https://blogs.msdn.microsoft.com/dgorti/2005/09/18/only-one-usage-of-each-socket-address-protocolnetwork-addressport-is-normally-permitted/.

omeraplak commented 7 years ago

@roji ohh okay i understand this problem from windows OS. I think fixed now. I putting powershell code for this fix. Select JUST one fix. I prefer to 2. Maybe you can put warning on npgsql.org :)

  1. $KeyPath = "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"
    Set-ItemProperty -Path $KeyPath -Name "MaxUserPort" -Value 65534
  2. $KeyPath = "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"
    Set-ItemProperty -Path $KeyPath -Name "TcpTimedWaitDelay" -Value 30

Thank you!

omeraplak commented 7 years ago

Hello again, I think is not fixed because this parameters optimization so hard. I want more explain my case, I have Postgresql 9.4.9, 3k request/per second and avarage(has autoscaling) 8 API (.Net Web API).

Firstly, I set 30 to TcpTimedWaitDelay. It's more good but 45-50 minutes later is giving same error.

Secondly, I set 30 to TcpTimeWaitDelay and set to MaxUserPort 65534 (max value). It's okay but so much ports expiring (because TcpTimedWaitDelay 30 seconds and so low, OS default 240 seconds) and my API's get a CPU cost.

And i stop. I look the mirror and i ask myself "What am I doing". It's so low level configuration and my application all time need regedit settings (it's so bad) I want to just use the pgBouncer. Someone has an idea?

@roji error exception is,

System.Net.Sockets.SocketException (0x80004005): Only one usage of each socket address (protocol/network address/port) is normally permitted
   at Npgsql.NpgsqlConnector.Connect(NpgsqlTimeout timeout)
   at Npgsql.NpgsqlConnector.RawOpen(NpgsqlTimeout timeout)
   at Npgsql.NpgsqlConnector.Open(NpgsqlTimeout timeout)
   at Npgsql.NpgsqlConnection.OpenInternal()
roji commented 7 years ago

@omeraplak I understand your situation, and it does sound painful. The only thing I don't really understand is the higher CPU cost you mentioned, I dont' know why that would be happening.

I'm not sure what to suggest... First, this doesn't have anything to do with Npgsql - all it does is open physical connections to your pg_bouncer. Have you tried describing your problem to the pg_bouncer people?

It may be that in very short-lived connection scenarios such as your own, with a very large volume of requests, pg_bouncer on Windows simply doesn't provide a good solution. I have to say that for very short-lived connections, the TCP overhead of establishing and closing physical connections with pg_bouncer seems problematic even if you didn't have this Windows-specific port exhaustion problem. This is one big advantage of in-process connection pooling - unlike with pg_bouncer there's virtually no connection overhead at all.

So you can always turn on Npgsql pooling (even in addition to pg_bouncer), exactly like @kurtschelfthout ended up doing.

omeraplak commented 7 years ago

@roji Okay i understand but i think npgsql pooling haven't good performance. pgBouncer can fast open new server connection because designed for this but npgsql is not designed.

Sometimes my API connection pools 100 (max connection size) connection and broke down my app.I see postgresql connections just 3-4 active process, 500-600 idle connection. I think if i decrease Connection Idle Timeout is coming with performance problems. Maybe npgsql connection pooler has a leak? I don't know how i can prove this. Maybe you can brifing me and i can prove this and we fix this issue.

roji commented 7 years ago

@roji Okay i understand but i think npgsql pooling haven't good performance. pgBouncer can fast open new server connection because designed for this but npgsql is not designed.

Uh, I'm not too sure what that means... There's nothing magical about what pgBouncer does, and there's nothing about Npgsql that "isn't designed" to open new server connections quickly...

Sometimes my API connection pools 100 (max connection size) connection and broke down my app.I see postgresql connections just 3-4 active process, 500-600 idle connection. I think if i decrease Connection Idle Timeout is coming with performance problems. Maybe npgsql connection pooler has a leak? I don't know how i can prove this. Maybe you can brifing me and i can prove this and we fix this issue.

I know many people are using Npgsql connection pooling and it works well. If there's some sort of bug or leak, I definitely want to know about it so I can fix it... So if you could provide more information about what's happening (e.g. how did your app break down), that would be great.

Beyond that, Npgsql pooling has some knobs which you need to understand well. Connection Idle Timeout) is very important, it determines how long an unused connection will stay idle in the pool before it gets discarded. You can also turn on performance counters to better understand how your application is behaving.

roji commented 7 years ago

To summarize, I'll be glad to help if you think there are issues with Npgsql pooling, open a separate issue for that with the proper info and we'll go from there.

topharley commented 6 years ago

This is not an issue of npgsql, but common problem of networking (i guess only in Windows)

Execute in command line to reproduce: netsh int ipv4 set dynamic tcp start=1025 num=5000

Repro code:

    class Program
    {
        static void Main(string[] args)
        {
            var count = 100;
            try
            {
                Parallel.For(0, count, x =>
                {
                    for (var y = 0; y < count; y++)
                    {
                        var endpoint = new IPEndPoint(IPAddress.Parse("192.x.x.x"), 5432); //CHANGE TO YOURS
                        using (var socket = new Socket(endpoint.AddressFamily, SocketType.Stream, ProtocolType.Tcp))
                        {
                            socket.Connect(endpoint);
                            socket.Close();
                        }
                    }
                });

                Console.WriteLine("Oh yeah!");
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.ToString());
            }

        }
    }

and you will get

System.Net.Sockets.SocketException (0x80004005): Only one usage of each socket address (protocol/network address/port) is normally permitted
   at Npgsql.NpgsqlConnector.Connect(NpgsqlTimeout timeout)
   at Npgsql.NpgsqlConnector.RawOpen(NpgsqlTimeout timeout)
   at Npgsql.NpgsqlConnector.Open(NpgsqlTimeout timeout)
   at Npgsql.NpgsqlConnection.OpenInternal()

After that execute in command line: netsh int ipv4 set dynamic tcp start=1025 num=64510

This will set ports range as high as possible on Windows and it will work without errors until you'll suddenly get peak load.

xadamxk commented 1 month ago

6 years later and this discussion was the first on my google results. Windows user with the same issue. I checked for running instances mentioned above and found nothing.

Turns out I had to edit my postgresql.conf (C:/Program Files\PostgreSQL\16\data) and change the config from listen_addresses = '*' to listen_addresses = 'localhost'. Restarting the postgres service didn't do the trick but everything was working after a reboot.