lexiforest / curl_cffi

Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
https://curl-cffi.readthedocs.io/
MIT License
1.92k stars 240 forks source link

[BUG] 使用stream流式下载时异常退出 #371

Open lyy077 opened 1 month ago

lyy077 commented 1 month ago

Describe the bug double free or corruption (fasttop): 0x00007fb9e003add0 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x81329)[0x7fba70ce1329] /root/.pyenv/versions/web-env/lib/python3.12/site-packages/curl_cffi/_wrapper.abi3.so(+0x4f8db)[0x7fba63fb18db] /root/.pyenv/versions/web-env/lib/python3.12/site-packages/curl_cffi/_wrapper.abi3.so(curl_easy_reset+0x22)[0x7fba63f97472] /root/.pyenv/versions/web-env/lib/python3.12/site-packages/curl_cffi/_wrapper.abi3.so(+0x30f3f)[0x7fba63f92f3f] /root/.pyenv/versions/3.12.1/lib/libpython3.12.so.1.0(_PyEval_EvalFrameDefault+0x627f)[0x7fba71a6c32f] /root/.pyenv/versions/3.12.1/lib/libpython3.12.so.1.0(+0x1898d3)[0x7fba71adc8d3] /root/.pyenv/versions/3.12.1/lib/libpython3.12.so.1.0(_PyObject_Call+0x4d)[0x7fba71adb0cd] /root/.pyenv/versions/3.12.1/lib/libpython3.12.so.1.0(+0x3a4046)[0x7fba71cf7046] /root/.pyenv/versions/3.12.1/lib/libpython3.12.so.1.0(+0x32cf37)[0x7fba71c7ff37] /lib64/libpthread.so.0(+0x7ea5)[0x7fba7173eea5] /lib64/libc.so.6(clone+0x6d)[0x7fba70d5eb0d] ======= Memory map: ======== 00400000-00401000 r-xp 00000000 fd:01 1564096 /root/.pyenv/versions/3.12.1/bin/python3.12 00600000-00601000 r--p 00000000 fd:01 1564096 /root/.pyenv/versions/3.12.1/bin/python3.12 00601000-00602000 rw-p 00001000 fd:01 1564096 /root/.pyenv/versions/3.12.1/bin/python3.12 0214b000-0366f000 rw-p 00000000 00:00 0 [heap]

To Reproduce

resp = Session().post(url, headers=headers, proxies=proxies, data=data, json=json,
                                                  allow_redirects=allow_redirects, impersonate='chrome110',
                                                  timeout=timeout, verify=verify, params=params, stream=stream)
with gzip.GzipFile(fileobj=compressed_buffer, mode='wb') as gzip_file:
    for chunk in resp.iter_content():
        if chunk:
            # 将每个块写入gzip压缩流
            gzip_file.write(chunk)
resp.close()

Versions

perklet commented 1 month ago

Please provide your URL and params, otherwise I can not reproduce on my side.

lyy077 commented 4 weeks ago

不是具体访问某一个url出现的这个问题。具体场景是在一台4c8g的centos7服务器上单进程,200个线程去请求各种url,运行一段时间,就会出现上边的问题。程序被自动kill掉了,今天出现了一种新的日志:malloc_consolidate(): invalid chunk size。我的服务器负载正常, CPU和内存负载都不高。

lexiforest commented 4 weeks ago

这个我也不好 debug 啊。因为底层的 libcurl 压根没有特别合适的 API,stream 的实现是很 tricky 的,又是锁又是队列的。我建议你试试多进程,可能比多线程好一些。