scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
51.16k stars 10.35k forks source link

AttributeError: 'Decompressor' object has no attribute 'process' #6252

Closed ehan03 closed 2 months ago

ehan03 commented 2 months ago

Description

Upon making a POST request using scrapy.Request, I get the following error: AttributeError: 'Decompressor' object has no attribute 'process'

The full traceback is as follows:

2024-02-27 20:41:35 [scrapy.core.scraper] ERROR: Error downloading <POST https://api.fightinsider.io/gql>
Traceback (most recent call last):
  File "C:\Users\eugen\anaconda3\envs\ufc\lib\site-packages\twisted\internet\defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "C:\Users\eugen\anaconda3\envs\ufc\lib\site-packages\scrapy\core\downloader\middleware.py", line 68, in process_response
    method(request=request, response=response, spider=spider)
  File "C:\Users\eugen\anaconda3\envs\ufc\lib\site-packages\scrapy\downloadermiddlewares\httpcompression.py", line 90, in process_response
    decoded_body = self._decode(
  File "C:\Users\eugen\anaconda3\envs\ufc\lib\site-packages\scrapy\downloadermiddlewares\httpcompression.py", line 134, in _decode
    return _unbrotli(body, max_size=max_size)
  File "C:\Users\eugen\anaconda3\envs\ufc\lib\site-packages\scrapy\utils\_compression.py", line 64, in _unbrotli
    output_chunk = decompressor.process(input_chunk)
AttributeError: 'Decompressor' object has no attribute 'process'

This only started happening after I upgraded my Scrapy version to the latest release. Reinstalling from scratch did not help either.

Going into the actual file the traceback points to yields this: image

Expected behavior: [What you expect to happen]

Previously, I would get no errors and my data would be parsed fine.

Actual behavior: [What actually happens]

AttributeError: 'Decompressor' object has no attribute 'process'

Reproduces how often: [What percentage of the time does it reproduce?]

This occurs every time I start crawling.

Versions

Scrapy       : 2.11.1
lxml         : 4.9.1.0
libxml2      : 2.9.14 
cssselect    : 1.2.0  
parsel       : 1.8.1  
w3lib        : 2.1.1  
Twisted      : 22.10.0
Python       : 3.9.18 (main, Sep 11 2023, 14:09:26) [MSC v.1916 64 bit (AMD64)]
pyOpenSSL    : 23.0.0 (OpenSSL 3.0.8 7 Feb 2023)
cryptography : 39.0.2
Platform     : Windows-10-10.0.19045-SP0
ehan03 commented 2 months ago

I was able to fix this by changing process to decompress inside scrapy > utils > _compression.py > _unbrotli

def _unbrotli(data: bytes, *, max_size: int = 0) -> bytes:
    decompressor = brotli.Decompressor()
    input_stream = BytesIO(data)
    output_stream = BytesIO()
    output_chunk = b"."
    decompressed_size = 0
    while output_chunk:
        input_chunk = input_stream.read(_CHUNK_SIZE)
        output_chunk = decompressor.decompress(input_chunk)
        decompressed_size += len(output_chunk)
        if max_size and decompressed_size > max_size:
            raise _DecompressionMaxSizeExceeded(
                f"The number of bytes decompressed so far "
                f"({decompressed_size} B) exceed the specified maximum "
                f"({max_size} B)."
            )
        output_stream.write(output_chunk)
    output_stream.seek(0)
    return output_stream.read()

This seems like a pretty temporary and hacky fix however

Gallaecio commented 2 months ago

Could if be that you are using the deprecated brotlipy package (no release since 2017) instead of brotli?

ehan03 commented 2 months ago

Could if be that you are using the deprecated brotlipy package (no release since 2017) instead of brotli?

Not sure because I never installed brotlipy in my env to my knowledge. It's weird because I had to make the fix inside of Scrapy's utils as found here

wRAR commented 2 months ago

Well, do you have brotli or brotlipy?

It's weird because I had to make the fix inside of Scrapy's utils as found here

It's not that weird as decompressor is brotli.Decompressor.

ehan03 commented 2 months ago

I have brotli

ehan03 commented 2 months ago

I just tried a clean reinstall of my env, which has fixed it. Sorry for wasting your time

wRAR commented 2 months ago

Improved in #6261

ehan03 commented 2 months ago

Ah I was using conda. That makes more sense now