lexiforest / curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
https://curl-cffi.readthedocs.io/
MIT License
2.49k stars 265 forks source link

[BUG] Websocket implementation uses crazy amount of CPU #346

Closed rdamaj closed 1 month ago

rdamaj commented 4 months ago

Describe the bug I have been running some tests with the websocket implementation and compared it to using the websocket library that curl_cffi was inspired by for its websocket api's according to the readme(websocket-client). When I use curl_cffi on my mac m2 locally I am getting 99% cpu usage for a simple test. the same connection to the same socket but using the actual websocket-client library is 4%. Discovered this because I am running an ecs service in aws that depended on a websocket connection and i needed to switch over to curl_cffi as there seemed to be some fingerprint detection. Now my service is using almost all of it's cpu.

To Reproduce

Here is the code using websocket-client library

import websocket
import time
import rel

def on_message(ws, message):
    print(message)

def on_error(ws, error):
    print(error)

def on_close(ws, close_status_code, close_msg):
    print("### closed ###")

def on_open(ws):
    print("Opened connection")

if __name__ == "__main__":
    websocket.enableTrace(True)
    ws = websocket.WebSocketApp("wss://api.gemini.com/v1/marketdata/BTCUSD",
                              on_open=on_open,
                              on_message=on_message,
                              on_error=on_error,
                              on_close=on_close)

    ws.run_forever(dispatcher=rel, reconnect=5)  # Set dispatcher to automatic reconnection, 5 second reconnect delay if connection closed unexpectedly
    rel.signal(2, rel.abort)  # Keyboard Interrupt
    rel.dispatch()

and here is the code using curl_cffi

from curl_cffi.requests import Session, WebSocket

test_url = "wss://api.gemini.com/v1/marketdata/BTCUSD"
with Session() as s:
    ws = s.ws_connect(
        test_url,
        on_message=on_message,
    )
    ws.run_forever()

Versions

aiofiles==23.2.1 aiogram==3.7.0 aiohttp==3.9.5 aiohttp-retry==2.8.3 aiosignal==1.3.1 altgraph==0.17.4 annotated-types==0.7.0 anyio==4.3.0 async-timeout==4.0.3 asyncio==3.4.3 attrs==23.1.0 awsebcli==3.20.5 base58==2.1.1 bcrypt==4.0.1 beautifulsoup4==4.12.3 bitarray==2.8.1 blackswan @ file:///Users/rdamaj/repos/blackswan/backend/blackswan blessed==1.20.0 blinker==1.6.2 borsh-construct==0.1.0 boto3==1.26.148 botocore==1.29.148 Brotli==1.1.0 bs4==0.0.2 cached-property==1.5.2 cement==2.8.2 certifi==2024.6.2 cffi==1.16.0 chardet==5.2.0 charset-normalizer==2.0.12 click==8.1.3 colorama==0.4.3 construct==2.10.68 construct-typing==0.5.6 cryptography==40.0.2 curl_cffi==0.7.0 cytoolz==0.12.2 deepdiff==6.7.1 discord==2.3.2 discord.py==2.3.2 distlib @ file:///private/tmp/python-distlib-20231213-5138-jol7s6/distlib-0.3.8 dnspython==2.3.0 docker==4.4.4 docker-compose==1.25.5 dockerpty==0.4.1 docopt==0.6.2 eth-abi==4.2.1 eth-account==0.11.0 eth-hash==0.5.2 eth-keyfile==0.6.1 eth-keys==0.4.0 eth-rlp==0.3.0 eth-typing==3.5.2 eth-utils==2.3.1 filelock @ file:///private/tmp/python-filelock-20231112-5668-1chgxqw/filelock-3.13.1/dist/filelock-3.13.1-py3-none-any.whl#sha256=57dbda9b35157b05fb3e58ee91448612eb674172fab98ee235ccb0b5bee19a1c Flask==2.3.2 Flask-Cors==3.0.10 frozenlist==1.4.0 gevent==23.7.0 greenlet==2.0.2 h11==0.14.0 helius==0.0.3 hexbytes==0.3.1 httpcore==1.0.5 httpx==0.27.0 idna==3.4 iniconfig==2.0.0 itsdangerous==2.1.2 Jinja2==3.1.2 jmespath==1.0.1 jsonalias==0.1.1 jsonschema==4.19.0 jsonschema-specifications==2023.7.1 lru-dict==1.2.0 macholib==1.16.3 magic-filter==1.0.12 MarkupSafe==2.1.2 mock==5.1.0 mpmath @ file:///private/tmp/python-mpmath-20231112-5470-18fvay2/mpmath-1.3.0 multidict==6.0.4 networkx @ file:///private/tmp/python-networkx-20231112-5787-1vfizee/networkx-3.2.1/dist/networkx-3.2.1-py3-none-any.whl#sha256=9d72450cd74f1e630af85739ac9967362e2a7ebe03d3164ce3af0659ab910890 oddsjam @ file:///Users/rdamaj/repos/darkhorse/backend/oddsjam oddsjam-api==0.2.9 ordered-set==4.1.0 outcome==1.3.0.post0 packaging==23.2 paramiko==3.1.0 parsimonious==0.9.0 pathspec==0.10.1 pbr @ file:///private/tmp/python-pbr-20231123-5531-s83en4/pbr-6.0.0 Pillow==10.1.0 platformdirs @ file:///private/tmp/python-platformdirs-20231204-5689-lioyxf/platformdirs-4.1.0 pluggy==1.3.0 protobuf==4.24.3 psutil @ file:///private/tmp/python-psutil-20231217-4775-x1h5hx/psutil-5.9.7 py-dotenv==0.1 pyaes==1.6.1 pyasn1==0.6.0 pycparser==2.21 pycryptodome==3.18.0 pydantic==2.7.3 pydantic_core==2.18.4 pyfcm==1.5.4 pyinstaller==6.6.0 pyinstaller-hooks-contrib==2024.6 PyJWT==2.7.0 pymongo==4.5.0 PyNaCl==1.5.0 pyrsistent==0.19.3 pyserum==0.5.0a0 PySocks==1.7.1 pytest==7.4.3 python-dateutil==2.8.2 python-dotenv==1.0.0 python-multipart==0.0.6 pytz==2023.3 pyunormalize==15.0.0 PyYAML==5.4.1 referencing==0.30.2 regex==2023.8.8 rel==0.4.9.19 reportlab==4.0.9 requests==2.26.0 rlp==3.0.0 rpds-py==0.10.3 rsa==4.9 s3transfer==0.6.1 selenium==4.18.1 semantic-version==2.8.5 six==1.14.0 slack_sdk==3.27.0 sniffio==1.3.1 solana==0.33.0 solc-select==1.0.4 solders==0.21.0 sortedcontainers==2.4.0 soupsieve==2.5 sumtypes==0.1a6 sympy @ file:///private/tmp/python-sympy-20231112-5693-1ycvyx1/sympy-1.12 Telethon==1.35.0 termcolor==1.1.0 texttable==1.6.7 toolz==0.12.0 trio==0.24.0 trio-websocket==0.11.1 twilio==9.0.0 typing_extensions==4.10.0 urllib3==1.26.7 wcwidth==0.1.9 web3==6.17.2 websocket-client==1.8.0 websockets==11.0.3 Werkzeug==2.3.4 wsproto==1.2.0 yarl==1.9.2 zope.event==5.0 zope.interface==6.0

Additional context

rdamaj commented 4 months ago

I know this is not the typical content/format of a bug report, does using curl_cffi for websocket connections just inherently m ean using a lot more cpu? It is necessary for me to use otherwise the websocket I am connecting to in prod denies my connection

perklet commented 4 months ago

Since websocket support is experimental in libcurl, and there is no API like curl_ws_poll. Currently, a busy polling loop is used to read messages from a websocket connection.

https://github.com/yifeikong/curl_cffi/blob/630a4dcfc24a73ace0e71d364aae1457cebc3fc0/curl_cffi/requests/websockets.py#L97-L101

We might need to add a time.sleep(0.001) here.

rdamaj commented 4 months ago

you think that will suffice to reduce the cpu usage? if so then that would be fine for me i can just implement that into my own forked version or something

perklet commented 4 months ago

It should be. I remeber someone told me he/she had to add that somewhere. If it does work for you, please open a PR here, so that others can benefit, too.

rdamaj commented 4 months ago

ok going to try that under line 98 maybe

rdamaj commented 4 months ago

pretty weird results, added a sleep, and i tested 2 sockets out(one websocket i dont need data from just a dummy test - which is gemini, and the socket i am actually tryna listen to - a site called dexscreener)

  1. for the one i am just using as a sanity test(gemini) the sleep fixed the issue
  2. for the other socket it did not, still using lots of cpu, measured the rate of incoming messages in both, about 3-4x the total bytes coming in between gemini and dexscreener.

Let me know if you still want me to make the PR, for my purposes my problem is not solved as before dexscreener updated something on their end with checking fingerprints, i was able to listen with normal cpu usage. now i cant listen at all without using curl_cffi. I imagine something here specific to the source i am listening to is the issue though and that for most cases the sleep fixes issues for people.

perklet commented 4 months ago

Have you tried adjust the amount of time to sleep?

rdamaj commented 4 months ago

yep, works totally fine to mitigate the issues with gemini(and i assume other sources) just not dexscreener specifically.

usedev commented 1 month ago

pr: https://github.com/lexiforest/curl_cffi/pull/413

from curl_cffi.requests import Session

def main():
    with Session() as s:
        ws = s.ws_connect("wss://echo.websocket.org/")
        while True:
            ws.send(b"hello")
            content, flags = ws.recv()
            print(content)

if __name__ == '__main__':
    main()

before:99% image

after:0.1% image