omkarcloud / botasaurus

The All in One Framework to build Awesome Scrapers.
https://www.omkar.cloud/botasaurus/
MIT License
1.14k stars 103 forks source link

Invalid image content when making request with use_stealth=True #89

Closed Wixome closed 1 month ago

Wixome commented 3 months ago

Hi! When I make a request with use_stealth=True the content of the response is different from the same request without use_stealth. Example of the first 20 characters of the content:

use_stealth=False
b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x02\x01\x00H\x00H\x00\x00'
use_stealth=True
b'\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\x00\x10JFIF\x00\x01'

Reproduction code:

from botasaurus import *

@request()
def without_stealth(request: AntiDetectRequests, data):
    response = request.get('https://upload.wikimedia.org/wikipedia/commons/3/3a/Cat03.jpg')
    print(response.content[:20])
    open('1.jpg','wb').write(response.content)

@request(use_stealth=True)
def with_stealth(request: AntiDetectRequests, data):
    response = request.get('https://upload.wikimedia.org/wikipedia/commons/3/3a/Cat03.jpg')
    print(response.content[:20])
    open('2.jpg','wb').write(response.content)

without_stealth()
with_stealth()
Chetan11-dev commented 1 month ago

This will happen, I recommend going with hrequests for handling images.