scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
51.16k stars 10.35k forks source link

Warn about br handling if brotlipy is not installed #4697

Open Gallaecio opened 3 years ago

Gallaecio commented 3 years ago

Currently, the HTTP compression middleware sets the Accept-Encoding header to gzip,deflate or gzip,deflate,br depending on whether or not brotlipy is installed.

However, if the user manually sets the Accept-Encoding header to a value containing br, and does not have brotlipy installed, the server can send a response compressed with br and Scrapy will silently fail to decompress it, so the response content will not be readable.

I think we should improve a bit the handling of this scenario.

If the user sets br manually on the Accept-Encoding header, I think we should:

The 3rd bullet point may not be that useful, but I think the 1st and 2nd are.

Gallaecio commented 3 years ago

That tool sounds very interesting!

Regarding this specific issue, it could be argued that it is indeed a good first issue. I’m unsure, though.

Dushatar commented 3 years ago

Hello. Me and two others would like to take a look at this issue as part of a course we are taking @Gallaecio

Hopefully we can make some progress in the, not so long, time allocated to the assignment.

Gallaecio commented 3 years ago

@Dushatar Please, have a go at it, and let us know if you need any help :slightly_smiling_face:

Dushatar commented 3 years ago

We did not have much time designated for this in the course, it was mostly just to get the feel for open source projects. We pushed some code that does what was requested (as we understood it), but I am not sure if the way we did it is as intended.

It (1) logs a warning if brotlipy is not installed and (2) raise a ImportError exception if a user is sent an br-encoded response while not having brotlipy installed.

Gallaecio commented 3 years ago

It’s quite a good start, but it would need some adjustments to get merged. It should definitely make things easier for the next person that comes around.

Anupam-USP commented 3 years ago

I am very new, sorry to ask silly questions but after changing files on my machine how can I check if this is causing some errors or is it ok to push and later create a pr?

Gallaecio commented 3 years ago

After changing files on my machine how can I check if this is causing some errors or is it ok to push and later create a pr?

You can run automated tests locally. But feel free to create a pull request and let the Continuous Integration system run tests automatically in multiple different environments. If your changes are not ready for a review yet, you can mark your pull request as a draft when creating it, and later mark it as ready for review.

PluT00 commented 2 years ago

Is this one still relevant? I'd like to work on it!

Gallaecio commented 2 years ago

There is https://github.com/scrapy/scrapy/pull/5480 already, pending review. You could review it and give your thoughts.

wRAR commented 1 year ago

Alternatively: #4698

Ad12y commented 3 weeks ago

Is this still relevant? If yes, I would like to work on it!

wRAR commented 1 day ago

@Ad12y it has a PR in progress but it's stalled, you can try making a fresh one (likely based on it).