replicate / replicate-python

Python client for Replicate
https://replicate.com
Apache License 2.0
770 stars 222 forks source link

httpx/httpcore ReadTimeouts in replicate.async_run #394

Open nicoluca opened 2 weeks ago

nicoluca commented 2 weeks ago

Hi! I am suddenly seeing a lot of readtimeouts - thought initially that it might have been a temporary issue on replicate side, but they seem to persist. Maybe similar to https://github.com/replicate/replicate-python/issues/345, but no further info there.

It doesn't seem to behave deterministically. E.g. for my last predictions, 8 out of 10 images were correctly downloaded (all were created). Example ReadTimeout exception:

Traceback (most recent call last):
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 72, in map_httpcore_exceptions
    yield
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 377, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request
    raise exc from None
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 196, in handle_async_request
    response = await connection.handle_async_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/connection.py", line 101, in handle_async_request
    return await self._connection.handle_async_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 143, in handle_async_request
    raise exc
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 113, in handle_async_request
    ) = await self._receive_response_headers(**kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 186, in _receive_response_headers
    event = await self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 224, in _receive_event
    data = await self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 32, in read
    with map_exceptions(exc_map):
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<REDACTED>/src/util/api/replicate_api.py", line 78, in generate_images
    result = await task
             ^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/replicate/client.py", line 189, in async_run
    return await async_run(
           ^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/replicate/run.py", line 96, in async_run
    prediction = await client.predictions.async_create(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/replicate/prediction.py", line 586, in async_create
    resp = await self._client._async_request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/replicate/client.py", line 94, in _async_request
    resp = await self._async_client.request(method, path, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_client.py", line 1585, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_client.py", line 1674, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_client.py", line 1702, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_client.py", line 1739, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_client.py", line 1776, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/replicate/client.py", line 319, in handle_async_request
    response = await self._wrapped_transport.handle_async_request(request)  # type: ignore
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 376, in handle_async_request
    with map_httpcore_exceptions():
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 89, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadTimeout

Wrapping it now in retries, which seems to help, but still rather very dissatisfactory.

aron commented 2 weeks ago

Hi @nicoluca could you give me some additional information about how you're using the library, it looks like you're using replicate.run() are you providing any additional arguments besides the model and inputs?

If possible, which model are you using?

nicoluca commented 2 weeks ago

Hi @aron,

Example call for a ComfyUI workflow:

replicate.async_run(
        model,
        input={
            "output_format": "png",
            "output_quality": 100,
            "randomise_seeds": True,
            "workflow_json": json.dumps(comfyui_json_dict)
        }
    )

Example call for nightmareai/real-esrgan:f121d640bd286e1fdc67f9799164c1d5be36ff74576ee11c803ae5b665dd46aa:

replicate.async_run(
        model,
        input={
            "image": open(image_path, "rb"),
            "scale": 2,
            "face_enhance": False
        }
    )

For the latter call I'm also seeing ReadErrors and 502s - but it appears only to happen when I initiate too many concurrently (e.g. now went through fine for 6 predictions and not at all for ~100).

aron commented 2 weeks ago

Thanks that's helpful. And just to be absolutely sure, you are using latest v1.0.3?

nicoluca commented 2 weeks ago

Was with v1.0.2 but just tried again with v1.0.3, no difference. My feeling is that it occurs more often when you try more concurrently, e.g. also leading to 5xxs sometimes on the server side. I believe replicate is just using default timeout values?