internetarchive / warcprox

WARC writing MITM HTTP/S proxy
380 stars 54 forks source link

Configurable CdxServerDedup urllib3 connection pool size #50

Closed vbanos closed 6 years ago

vbanos commented 6 years ago

urllib3 pool has default maxsize=1 http://urllib3.readthedocs.io/en/latest/advanced-usage.html. We need to set a higher value because we get warnings like this:

2018-01-15 20:04:10,044 18436 WARNING WarcWriterThread030(tid=18502)
urllib3.connectionpool._put_conn(connectionpool.py:277) Connection pool
is full, discarding connection: wwwb-dedup

We set value: cdxserver_maxsize = args.writer_threads or 200.

Note that the ideal would be to use this https://github.com/internetarchive/warcprox/blob/master/warcprox/main.py#L284 but it is initialized after dedup, there is a dependency and we cannot use it.