scrapy-plugins / scrapy-playwright

🎭 Playwright integration for Scrapy
BSD 3-Clause "New" or "Revised" License
1.03k stars 113 forks source link

When I trigger the download event and get the file response, an exception is thrown because of the wrong Content-Encoding #321

Closed ma-pony closed 2 weeks ago

ma-pony commented 1 month ago

An exception is thrown because of the wrong content encoding when I fire the download event and get the file response.

2024-10-08 15:26:39 [scrapy.core.scraper] ERROR: Error downloading <GET http://www.yanan.gov.cn/gk/fdzdgknr/zdxm/sphzbaxx/1833020334224224258.html>
Traceback (most recent call last):
  File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
    result = current_context.run(gen.send, result)
StopIteration: <200 http://www.yanan.gov.cn/upload/yanan/2024/09/09/202409091350252175.pdf>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
    result = current_context.run(gen.send, result)
  File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/scrapy/core/downloader/middleware.py", line 68, in process_response
    method(request=request, response=response, spider=spider)
  File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 90, in process_response
    decoded_body = self._decode(
  File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 130, in _decode
    return gunzip(body, max_size=max_size)
  File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/scrapy/utils/gz.py", line 21, in gunzip
    chunk = f.read1(_CHUNK_SIZE)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/gzip.py", line 314, in read1
    return self._buffer.read1(size)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/gzip.py", line 488, in read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/gzip.py", line 436, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'%P')
elacuesta commented 2 weeks ago

Addressed by #322