HTTP 2 support - Githubissues

povilasb commented 8 years ago

I peeked at the docs, at the issues but couldn't find any info about HTTP 2 support.

Does scrapy support it?

redapple commented 8 years ago

@povilasb , no, scrapy only supports HTTP/1.0 and HTTP/1.1 (we currently use twisted.web.client.Agent)

pawelmhm commented 8 years ago

Looking into Twisted I found out there is work on adding HTTP 2 support to Twisted.web: https://twistedmatrix.com/trac/ticket/7460 Once this is merged upstream do you think Scrapy should also follow suit and add HTTP 2 support?

mguerreiro commented 7 years ago

Up, this has been merged :)

kmike commented 7 years ago

It seems the ticket is for HTTP2 server support, not for client. There is an example of twisted http2 client in python-hyper docs: https://python-hyper.org/projects/h2/en/stable/twisted-post-example.html

pawelmhm commented 7 years ago

it seems like work on Twisted client is not making much progress now, but this example you linked @kmike doesn't look terribly complicated so maybe we could just add h2 as Scrapy requirement and write our own HTTP2 Twisted-Scrapy client?

AnshulMalik commented 7 years ago

Hi, I am a student from India, planning to participate in GSoC this year. There was a project regarding "New HTTP/1.1 download handler", this was required because some issues with twisted API, but they have recently resolved those issues (@redapple told me). Then I realized that scrapy doesn't have support for HTTP/2 so why not choose this as GSoC project.

I have also done a bit of research about twisted implementing HTTP/2, python-hyper which is nice implementation of HTTP/2, twisted also uses h2 for supportng for HTTP2. Currently, Twisted only have support for HTTP2 on server side ( @kmike )

As @pawelmhm mentioned, one way is to use h2 and start the work. Since we don't know when Twisted will add support for HTTP2 client, so we should write our client.

I am seeking your reviews for this idea, If I should do it, what things should I keep in mind, where can I find more info about HTTP2. And most important, is this idea worth? (Twisted might add support for HTTP2 in future)

alexeyqu commented 6 years ago

Hi, I'm a 5th year CS student from MIPT, Russia, planning to participate in GSoC this year. I write in Python for over 3 years and teach it at MIPT, more detailed CV is here.

This year, the HTTP/2 support is mentioned in the ideas list. On the low level, Scrapy uses twisted module for HTTP connections. After looking at Twisted and Hyper repos, I realized that in terms of HTTP/2 support on the client side, nothing has changed since @AnshulMalik mentioned it here (except for this gist with simple HTTP/2 client, but it was around for a long time). That, obviously, give some points to the "use h2 and start the work" approach.

I wonder, is there a reason for silence on this topic (both here and in Twisted)? (I don't have any experience with HTTP/2 internals yet, so maybe I'm missing something crucial) I've just started familiarizing myself with Scrapy codebase, could you suggest some good first bugs to fix?

Upd: since this issue is really old and hard to find, I'm going to summon the 2 possible mentors here: @dangra @lopuhin .

kmike commented 6 years ago

Hey @alexeyqu! That's a feature which could make Scrapy more resource-efficient in some specific cases, not a bug which affects day-to-day work - probably that's the reason it gets less attention.

Glennvd commented 6 years ago

Hey @kmike, I'm afraid I disagree. I'm seeing an increase in websites (using a specific vendor) that detect bot traffic based on HTTP2/HTTP1.1 version compared to what's the expected default from your user agent. While I do understand that that does not make it a priority for the Twisted team, I'm sure it has a significant impact for a lot of scrapy users.