threatpatrols / hibp-downloader

Efficiently download new pwned password hashes from api.pwnedpasswords.com fast
https://hibp-downloader.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
15 stars 3 forks source link

Timeouts (short dns timeout value?) #2

Open wildegnux opened 10 months ago

wildegnux commented 10 months ago

I'm repeatedly getting the following errors running the downloader:

Process Process-11:
Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/anyio/_core/_sockets.py", line 190, in connect_tcp
    addr_obj = ip_address(remote_host)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/ipaddress.py", line 54, in ip_address
    raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: 'api.pwnedpasswords.com' does not appear to be an IPv4 or IPv6 address

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/anyio/_core/_tasks.py", line 115, in fail_after
    yield cancel_scope
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 114, in connect_tcp
    stream: anyio.abc.ByteStream = await anyio.connect_tcp(
                                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/anyio/_core/_sockets.py", line 193, in connect_tcp
    gai_res = await getaddrinfo(
              ^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fbb37227890

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
    yield
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 113, in connect_tcp
    with anyio.fail_after(timeout):
  File "/usr/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/anyio/_core/_tasks.py", line 118, in fail_after
    raise TimeoutError
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_transports/default.py", line 66, in map_httpcore_exceptions
    yield
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_transports/default.py", line 366, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 268, in handle_async_request
    raise exc
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 251, in handle_async_request
    response = await connection.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_async/connection.py", line 99, in handle_async_request
    raise exc
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_async/connection.py", line 76, in handle_async_request
    stream = await self._connect(request)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_async/connection.py", line 124, in _connect
    stream = await self._network_backend.connect_tcp(**kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_backends/auto.py", line 30, in connect_tcp
    return await self._backend.connect_tcp(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 112, in connect_tcp
    with map_exceptions(exc_map):
  File "/usr/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ConnectTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 178, in queue_worker_process
    asyncio.run(pwnedpasswords_get_store_gather(result_queue, hash_prefixes, worker_args))
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 182, in pwnedpasswords_get_store_gather
    metadata_results = await asyncio.gather(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 228, in pwnedpasswords_get_and_store_async
    binary, metadata_latest = await pwnedpasswords_get(prefix, hash_type=hash_type, etag=etag, encoding=encoding_type)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 269, in pwnedpasswords_get
    response = await httpx_binary_response(url=url, etag=etag, encoding=encoding, debug=httpx_debug)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/hibp_downloader/lib/http.py", line 53, in httpx_binary_response
    response = await client.send(request=request, stream=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_client.py", line 1617, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_client.py", line 1719, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_transports/default.py", line 365, in handle_async_request
    with map_httpcore_exceptions():
  File "/usr/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/dam/.virtualenvs/pwneddb/lib/python3.11/site-packages/httpx/_transports/default.py", line 83, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ConnectTimeout

The thread managing also does not handle the threads crashing, making those threads stall. Eventually every thread crashes and the main process gets stuck waiting for the now crashed threads to close.

gregsadetsky commented 10 months ago

Same here i.e. ValueError: 'api.pwnedpasswords.com' does not appear to be an IPv4 or IPv6 address (funny coincidence that I just got the same error as this issue created a few hours ago..!)

Thanks @threatpatrols for creating this tool! Cheers

ndejong commented 10 months ago

Thanks for the report(s) @gregsadetsky and @wildegnux

I've not had a chance to look into this yet, however I've been able to compare Python 3.10 vs 3.11 and somewhere in between something has broken :(

My current limited check is that I've just used hibp-downloader under Python 3.10 to update my own local --data-path to see if the latest reported Naz.API dataset is included yet - the tool works fine under 3.10

Not entirely sure the root cause is a DNS lookup issue yet, as I get slightly different messages in my 3.11 test environment

2024-01-19T01:17:44+1000 | INFO | hibp-downloader | HIBP Downloader: v0.2.1
2024-01-19T01:17:44+1000 | INFO | hibp-downloader | data-path '/home/ndejong/test/data'
2024-01-19T01:17:44+1000 | INFO | hibp-downloader | metadata-path '/home/ndejong/test/data'
2024-01-19T01:17:45+1000 | INFO | hibp-downloader | Created 4 worker processes to consume a queue of prefix-hash values.
2024-01-19T01:17:52+1000 | INFO | hibp-downloader | prefix=000ef source=[lc:0 et:0 rc:249 ro:0 xx:1] processed=[4.4MB ~24824H/s] api=[55req/s 4.4MB] runtime=0.2min
2024-01-19T01:18:01+1000 | INFO | hibp-downloader | prefix=001f3 source=[lc:0 et:0 rc:499 ro:0 xx:1] processed=[8.8MB ~26053H/s] api=[58req/s 8.8MB] runtime=0.3min
Process Process-4:
Traceback (most recent call last):
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/anyio/streams/tls.py", line 140, in _call_sslobject_method
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib/python3.11/ssl.py", line 921, in read
    v = self._sslobj.read(len)
        ^^^^^^^^^^^^^^^^^^^^^^
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:2580)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/anyio/_core/_tasks.py", line 115, in fail_after
    yield cancel_scope
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 34, in read
    return await self._stream.receive(max_bytes=max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/anyio/streams/tls.py", line 205, in receive
    data = await self._call_sslobject_method(self._ssl_object.read, max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/anyio/streams/tls.py", line 147, in _call_sslobject_method
    data = await self.transport_stream.receive()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1123, in receive
    await self._protocol.read_event.wait()
  File "/usr/lib/python3.11/asyncio/locks.py", line 213, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7905d01b4790

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
    yield
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 32, in read
    with anyio.fail_after(timeout):
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/anyio/_core/_tasks.py", line 118, in fail_after
    raise TimeoutError
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_transports/default.py", line 66, in map_httpcore_exceptions
    yield
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_transports/default.py", line 366, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 268, in handle_async_request
    raise exc
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 251, in handle_async_request
    response = await connection.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
    return await self._connection.handle_async_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 185, in handle_async_request
    raise exc
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 148, in handle_async_request
    status, headers = await self._receive_response(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 292, in _receive_response
    event = await self._receive_stream_event(request, stream_id)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 333, in _receive_stream_event
    await self._receive_events(request, stream_id)
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 361, in _receive_events
    events = await self._read_incoming_data(request)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 452, in _read_incoming_data
    raise exc
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_async/http2.py", line 438, in _read_incoming_data
    data = await self._network_stream.read(self.READ_NUM_BYTES, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 31, in read
    with map_exceptions(exc_map):
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 178, in queue_worker_process
    asyncio.run(pwnedpasswords_get_store_gather(result_queue, hash_prefixes, worker_args))
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 182, in pwnedpasswords_get_store_gather
    metadata_results = await asyncio.gather(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 228, in pwnedpasswords_get_and_store_async
    binary, metadata_latest = await pwnedpasswords_get(prefix, hash_type=hash_type, etag=etag, encoding=encoding_type)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 269, in pwnedpasswords_get
    response = await httpx_binary_response(url=url, etag=etag, encoding=encoding, debug=httpx_debug)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/hibp_downloader/lib/http.py", line 53, in httpx_binary_response
    response = await client.send(request=request, stream=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_client.py", line 1617, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_client.py", line 1719, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_transports/default.py", line 365, in handle_async_request
    with map_httpcore_exceptions():
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/httpx/_transports/default.py", line 83, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout
^CSIGINT received, exiting.
SIGINT received, exiting.
SIGINT received, exiting.
SIGINT received, exiting.
^CSIGINT received, exiting.
Exception ignored in atexit callback: <function _exit_function at 0x7905d137ce00>
Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/util.py", line 360, in _exit_function
    _run_finalizers()
  File "/usr/lib/python3.11/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/usr/lib/python3.11/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/queues.py", line 199, in _finalize_join
    thread.join()
  File "/usr/lib/python3.11/threading.py", line 1119, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.11/threading.py", line 1139, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ndejong/test/.venv/lib/python3.11/site-packages/hibp_downloader/main.py", line 20, in sigint_handler
    exit()
  File "<frozen _sitebuiltins>", line 26, in __call__
SystemExit: None
ndejong commented 10 months ago

Note to self

ndejong commented 10 months ago

New release v0.3.0 that might resolve this issue, unfortunately it's been really hard to replicate the errors on this to chase it down - https://github.com/threatpatrols/hibp-downloader/releases/tag/0.3.0

There are two key changes that I'm expecting should move the needle on this issue though -

The code is more robust around handling worker threads when they have HTTPX exceptions so we should see less infinite loop scenarios, however clean thread exiting (eg SIGINT with ^C) is still not a thing :(

Please do send feedback so we can gauge if this resolves.

wildegnux commented 10 months ago

Occasional errors still occur on Python 3.11.6 (on EndeavourOS):

2024-01-24T14:50:16+0100 | DEBUG | hibp-downloader | worker_index=14 prefix='0413a' hash_type='sha1' encoding_type='gzip' ignore_etag=False local_cache_ttl=43200 start_timestamp=2024-01-24 14:50:16.201230+01:00
2024-01-24T14:50:16+0100 | DEBUG | hibp-downloader | No existing metadata file 'data2/sha1/04/13/0413a.meta' using an empty entry
2024-01-24T14:50:46+0100 | ERROR | hibp-downloader | Unable to establish connection https://api.pwnedpasswords.com/range/0413a

Debug log doesn't seem to tell me anything else about those threads that might be useful.

But the failing threads do exit cleanly now so there's no more infinite wait. And to workaround the errors once could just run it in a loop until it stops giving errors.

wildegnux commented 10 months ago

I can also reproduce the same errors on a different machine (in a different network) with Python 3.10.12 on Ubuntu 22.04.3 LTS (running under WSL)

ndejong commented 10 months ago

Okay, that helps - are you able to provide a list of the installed Python packages in each virtual environment you have there?

wildegnux commented 10 months ago

Python 3.11.6:

aiofiles==23.2.1
anyio==4.2.0
certifi==2023.11.17
click==8.1.7
colorama==0.4.6
h11==0.14.0
h2==4.1.0
hibp-downloader==0.3.0
hpack==4.0.0
httpcore==1.0.2
httpx==0.26.0
hyperframe==6.0.1
idna==3.6
markdown-it-py==3.0.0
mdurl==0.1.2
Pygments==2.17.2
rich==13.7.0
shellingham==1.5.4
sniffio==1.3.0
typer==0.9.0
typing_extensions==4.9.0

Python 3.10.12:

aiofiles==23.2.1
anyio==4.2.0
certifi==2023.11.17
click==8.1.7
colorama==0.4.6
exceptiongroup==1.2.0
h11==0.14.0
h2==4.1.0
hibp-downloader==0.3.0
hpack==4.0.0
httpcore==1.0.2
httpx==0.26.0
hyperframe==6.0.1
idna==3.6
markdown-it-py==3.0.0
mdurl==0.1.2
Pygments==2.17.2
rich==13.7.0
shellingham==1.5.4
sniffio==1.3.0
typer==0.9.0
typing_extensions==4.9.0
FlorianLoch commented 10 months ago

I am also experiencing the Unable to establish connection errors - and, from reading the logs, I assume they are not retried? 🤔

ndejong commented 10 months ago

Yes, retries are not a thing in the current code - would surely help

Current thinking is that the timeouts would make sense if you consider this thing creates a lot connections which may in-turn cause an upstream network component to grok - for example a small home router may not have enough NAT state table and then intermittently show connection problems.

This line of thinking might explain why I've rarely been able to replicate this most of the time.

Could you give it another spin and reduce the --processes and --chunk-size values - if this is the underlaying issue then a feature to dynamically scale back to suit the local network conditions might be the thing needed here

ndejong commented 9 months ago

New release 0.3.1 implements retries and provides CLI options to adjust the default values https://pypi.org/project/hibp-downloader/0.3.1/

pip install --upgrade hibp-downloader

Feedback would be great

wildegnux commented 8 months ago

Sorry for the lack of feedback! Will run some test now!

wildegnux commented 8 months ago

I still run into some threads crashing:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/anyio/streams/tls.py", line 140, in _call_sslobject_method
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib/python3.11/ssl.py", line 921, in read
    v = self._sslobj.read(len)
        ^^^^^^^^^^^^^^^^^^^^^^
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:2580)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/anyio/_core/_tasks.py", line 115, in fail_after
    yield cancel_scope
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 34, in read
    return await self._stream.receive(max_bytes=max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/anyio/streams/tls.py", line 205, in receive
    data = await self._call_sslobject_method(self._ssl_object.read, max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/anyio/streams/tls.py", line 147, in _call_sslobject_method
    data = await self.transport_stream.receive()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1123, in receive
    await self._protocol.read_event.wait()
  File "/usr/lib/python3.11/asyncio/locks.py", line 213, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 72b64bf7c550

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
    yield
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 32, in read
    with anyio.fail_after(timeout):
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/anyio/_core/_tasks.py", line 118, in fail_after
    raise TimeoutError
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpx/_transports/default.py", line 67, in map_httpcore_exceptions
    yield
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpx/_transports/default.py", line 252, in __aiter__
    async for part in self._httpcore_stream:
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 361, in __aiter__
    async for part in self._stream:
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 582, in __aiter__
    raise exc
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 572, in __aiter__
    async for chunk in self._connection._receive_response_body(
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 313, in _receive_response_body
    event = await self._receive_stream_event(request, stream_id)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 333, in _receive_stream_event
    await self._receive_events(request, stream_id)
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 361, in _receive_events
    events = await self._read_incoming_data(request)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 452, in _read_incoming_data
    raise exc
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_async/http2.py", line 438, in _read_incoming_data
    data = await self._network_stream.read(self.READ_NUM_BYTES, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 31, in read
    with map_exceptions(exc_map):
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 193, in queue_worker_process
    asyncio.run(pwnedpasswords_get_store_gather(result_queue, hash_prefixes, worker_args))
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
2024-03-08T14:55:04+0100 | ESC[1;37mDEBUGESC[0m | hibp-downloader | Request attempt 1 of 5 for 'https://api.pwnedpasswords.com/range/0ec91'
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 197, in pwnedpasswords_get_store_gather
    metadata_results = await asyncio.gather(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 254, in pwnedpasswords_get_and_store_async
    binary, metadata_latest = await pwnedpasswords_get(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/hibp_downloader/commands/hibp_download.py", line 306, in pwnedpasswords_get
    response = await httpx_binary_response(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/hibp_downloader/lib/http.py", line 74, in httpx_binary_response
    response.binary = b"".join([part async for part in response.aiter_raw()])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/hibp_downloader/lib/http.py", line 74, in <listcomp>
    response.binary = b"".join([part async for part in response.aiter_raw()])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpx/_models.py", line 991, in aiter_raw
    async for raw_stream_bytes in self.stream:
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpx/_client.py", line 147, in __aiter__
    async for chunk in self._stream:
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpx/_transports/default.py", line 251, in __aiter__
    with map_httpcore_exceptions():
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/dam/.virtualenvs/pwn/lib/python3.11/site-packages/httpx/_transports/default.py", line 84, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout

And now some retries (nice :+1: )

2024-03-08T14:53:30+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/05cbc'
2024-03-08T14:53:30+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/05cbf'
2024-03-08T14:53:51+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/075ea'
2024-03-08T14:54:14+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/0922a'
2024-03-08T14:55:04+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/0ccec'
2024-03-08T14:55:04+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/0cced'
2024-03-08T14:55:04+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/0ccf2'
2024-03-08T15:11:25+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/4e3ac'
2024-03-08T15:12:56+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/540c3'
2024-03-08T15:19:08+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/6bb9d'
2024-03-08T15:19:08+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/6bba1'
2024-03-08T15:39:30+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/b9611'
2024-03-08T15:39:30+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/b9615'
2024-03-08T15:40:41+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/bdb53'
2024-03-08T15:47:58+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/da102'

Unclear to me if that crashed thread is one of the retries...

That test was run on my laptop in my office (wifi) network using the default options. Running the same right now on my home network. Will also run a test on my home desktop (WSL).

wildegnux commented 8 months ago

Test from home network: No crashed threads, but more of these failed downloads:

2024-03-09T18:54:56+0100 | DEBUG | hibp-downloader | worker_index=6 prefix='d3215' hash_type='sha1' encoding_type='gzip' http_timeout=30 http_max_retries=5 http_debug=Falseignore_etag=False local_cache_ttl=43200 start_timestamp=2024-03-09 18:54:56.486975+01:00
2024-03-09T18:54:56+0100 | DEBUG | hibp-downloader | No existing metadata file 'test_home/sha1/d3/21/d3215.meta' using an empty entry
2024-03-09T18:54:56+0100 | DEBUG | hibp-downloader | Request attempt 1 of 5 for 'https://api.pwnedpasswords.com/range/d3215'
2024-03-09T18:55:06+0100 | WARNING | hibp-downloader | Request [1 of 5] failed for 'GET' 'https://api.pwnedpasswords.com/range/d3215'
2024-03-09T18:55:06+0100 | DEBUG | hibp-downloader | Request attempt 2 of 5 for 'https://api.pwnedpasswords.com/range/d3215'

But all failures succeeded on attempt 2. :+1:

wildegnux commented 8 months ago

Test in WSL (home network): With default options (which on this machine means 32 processes) it runs full boar for ~30-60 sec, then completely stops for ~30 sec (matching the http-timeout value). Then all the active download fails and it continues going at full speed again for another minute and then stops again and so on. Reducing processes / chunks size does improve things, but I had to go down to 8 processes and 1 chunk for getting it to not occasionally lock up. Might be some funky stuff in the WSL network stack I guess, since it worked mostly fine one the linux laptop in the same network.