Closed vbanos closed 4 years ago
I think leaving maxsize at the default 1 was just an oversight. The // 6
in calculation of num_pools suggests that the intention was to have the max number of open connections == max_threads. Which also suggests to me that maxsize=6
is the appropriate value, again following the browser convention of max 6 connections per host.
- num_pools=max((options.max_threads or 0) // 6, 400))
+ num_pools=max((options.max_threads or 0) // 6, 400), maxsize=10)
@nlevitt its possible that multiple browsers will use one warcprox instance at the same time. In fact, in SPN2 production 5 browsers use the same warcprox instance. In that case, there could be more than 6 concurrent connections to the same host.
I still think 6 is a good number. max open connections = max threads seems like a sensible convention. What is the sense of // 6
in the num_pools calculation otherwise? 6 is already a lot of connections to maintain to one host. Making a connection is not all that expensive. Increasing maxsize uses up more filehandles. It's not at all obvious what the optimal numbers are for num_pools and maxsize. Maybe a larger value for num_pools (more hosts) and smaller for maxsize (fewer connections per host) would turn out to work better. Of course it depends on the workload. We're groping in the dark here. With that in mind, there is at least logic behind choosing 6.
OK, I like your reasoning. I set it to 6.
Thanks!
We noticed a lot of log entries like this in production:
this happens because we use a
PoolManager
and create a number of pools (paramnum_pools
) but the number of connections each pool can have is just 1 by default (parammaxsize
is 1 by default).urllib3
docs say:maxsize
– Number of connections to save that can be reused. More than 1 is useful in multithreaded situations. Ref: https://urllib3.readthedocs.io/en/1.2.1/pools.html#urllib3.connectionpool.HTTPConnectionPoolI suggest to use
maxsize=10
and re-evaluate after some time if its big enough. (We'll see if we still get "Connection pool is full" warnings in the logs).This improvement will boost performance as we'll reuse more connections to remote hosts.