lexiforest / curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
https://curl-cffi.readthedocs.io/
MIT License
2.35k stars 255 forks source link

[BUG] Failed to perform curl: (18) or Failed to perform, curl: (55) when streaming #319

Open ripperdoc opened 5 months ago

ripperdoc commented 5 months ago

Describe the bug I'm downloading files with async and streaming. It randomly fails, often when repeating. It could be related to #302 . I mostly get error 18, sometimes 55.

 curl_cffi.requests.errors.RequestsError: Failed to perform, curl: (18) . See https://curl.se/libcurl/c/libcurl-errors.html first 

or

for more details.
Traceback (most recent call last):
  File "./loaders.py", line 225, in load
    async for chunk in response.aiter_content():
  File "./venv/lib/python3.11/site-packages/curl_cffi/requests/models.py", line 246, in aiter_content
    raise chunk
curl_cffi.requests.errors.RequestsError: Failed to perform, curl: (55) Recv failure: Connection reset by peer. See https://curl.se/libcurl/c/libcurl-errors.html first for more details.

To Reproduce

async def test_curl_cffi_error():
    from curl_cffi.requests import AsyncSession

    http = AsyncSession()
    # If using HTTP 1.1. it works
    # http = AsyncSession(http_version=CurlHttpVersion.V1_1)

    # Also tested beta

    # Note that some other URLs seems to work 
    uri = "https://kunskapsstyrningvard.se/download/18.888036617b192361ee22c6a/1628608381981/Vardforlopp-hjartsvikt-nydebuterad.pptx"

    # Broken
    uri = "https://firebasestorage.googleapis.com/v0/b/fictive-dev.appspot.com/o/videos%2Fdemo_e718ef.docx?alt=media&token=3ad7735f-212d-4372-b5b9-d5171a63d3c6"

    response = await http.get(
        uri,
        default_headers=True,
        stream=True,
        impersonate="chrome",
    )
    length = 0
    with tempfile.NamedTemporaryFile(delete=True) as temp:
        async with aiofiles.open(temp.name, "wb") as f:
            async for chunk in response.aiter_content():
                await f.write(chunk)
                length += len(chunk)
    assert length > 0
    response.close()

    await asyncio.sleep(1)
    # if I reset the session between, it works
    # http = AsyncSession()

  # If I comment this out, it works
    response = await http.get(
        uri,
        default_headers=True,
        stream=True,
        impersonate="chrome",
    )
    length = 0
    with tempfile.NamedTemporaryFile(delete=True) as temp:
        async with aiofiles.open(temp.name, "wb") as f:
            async for chunk in response.aiter_content():
                await f.write(chunk)
                length += len(chunk)
    assert length > 0
    response.close()

Expected behavior

I expect that I can reuse a session and download files both in parallell and sequentially. Of course, I could reset the session every time as workaround but that could have other implications. Or is there a better way to stream files than aiofiles?

Versions

perklet commented 5 months ago

Possible cause: https://github.com/curl/curl/issues/4915

perklet commented 5 months ago

A simple fix is to force new connection each time:

    async def pop_curl(self):
        curl = await self.pool.get()
        if curl is None:
            curl = Curl(debug=self.debug)
+       curl.setopt(CurlOpt.FRESH_CONNECT, 1)
        return curl