lexiforest / curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
https://curl-cffi.readthedocs.io/
MIT License
2.38k stars 257 forks source link

[BUG] Timeout issues in stream mode #215

Closed ZentixUA closed 9 months ago

ZentixUA commented 9 months ago

Describe the bug Some hosts have an issue where, during streaming, timeout seems to have no effect; closing the connection on certain sources can take 20 or even 30 seconds, even when no new data is coming through. Other sources can completely hang and even with a timeout setting of (30, 60 * 10), they remain open for hours without closing.

To Reproduce

import asyncio
import json
import sys
import uuid

from curl_cffi.requests import AsyncSession

url = "https://huggingface.co/chat/conversation"
data = {
    "model": "mistralai/Mistral-7B-Instruct-v0.2",
    "preprompt": ""
}
chat_data = {
    'inputs': 'Hey! Say a few words',
    'id': str(uuid.uuid4()),
    'response_id': str(uuid.uuid4()),
    'is_retry': False,
    'web_search': False
}

async def main():
    async with AsyncSession() as session:
        response = await session.request(
            'POST',
            url,
            json=data,
            impersonate="chrome120",
            default_headers=False
        )
        conv = response.json()['conversationId']
        async with session.stream(
                'POST',
                f'{url}/{conv}',
                json=chat_data,
                cookies=response.cookies
        ) as ai_response:
            async for line in ai_response.aiter_lines():
                line = json.loads(line)
                if line.get('type') == 'finalAnswer':
                    print('Final answer, breaking...')
                    break
                print(line)

if __name__ == '__main__':
    if sys.platform == 'win32':
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
    asyncio.run(main())

I can provide another example, but it is quite private so perker can contact me in Telegram :)

Expected behavior The connection closes immediately

Versions

coletdjnz commented 9 months ago

For streaming, the timeout is currently only a connection timeout and not a read timeout, as per https://github.com/yifeikong/curl_cffi/issues/156.

You can achieve a read timeout by setting this before you request:

 session.curl.setopt(CurlOpt.LOW_SPEED_LIMIT, 1)  # 1 byte per second
 session.curl.setopt(CurlOpt.LOW_SPEED_TIME, math.ceil(20.0)) # timeout after 20 seconds of receiving <=1byte/s
ZentixUA commented 9 months ago
session.acurl.setopt(CurlOpt.LOW_SPEED_LIMIT, 1)

TypeError: initializer for ctype 'void *' must be a cdata pointer, not int

perklet commented 9 months ago

acurl is not a curl instance, asyncio session uses a pool of curl instances, you can do like this:

session = AsyncSession(..., 
    curl_options={
        CurlOpt.LOW_SPEED_LIMIT: 1, 
        CurlOpt.LOW_SPEED_TIME: math.ceil(20.0)
    }
)
ZentixUA commented 9 months ago

Thank you. What about a break in the middle of iterating content as in the example? In aiohttp, the Response closes immediately, for instance. But with curl_cffi, I'm waiting 20-30 seconds. :(

perklet commented 9 months ago

Sorry, libcurl does not allow us to close the connection early. You can break from the iteration loop, but the connection will be floating around anyway. If your request number is not very big, it might be acceptable to just let the connections be closed automatically.

perklet commented 9 months ago

For streaming, the timeout is currently only a connection timeout and not a read timeout, as per #156.

You can achieve a read timeout by setting this before you request:

 session.curl.setopt(CurlOpt.LOW_SPEED_LIMIT, 1)  # 1 byte per second
 session.curl.setopt(CurlOpt.LOW_SPEED_TIME, math.ceil(20.0)) # timeout after 20 seconds of receiving <=1byte/s

Change this to default since v0.6.0b9.