httprb / http

HTTP (The Gem! a.k.a. http.rb) - a fast Ruby HTTP client with a chainable API, streaming support, and timeouts
MIT License
3k stars 321 forks source link

Remove Content-Length: 0 from GET request #487

Closed zt2 closed 6 years ago

zt2 commented 6 years ago

When I find a GET request, there is always a Content-Length: 0 field in the HTTP header like

GET / HTTP/1.1
Connection: close
Host: localhost:8080
User-Agent: http.rb/3.3.0
Content-Length: 0

It happened in lib/http/request/writer.rb add_body_type_headers:

      # Adds the headers to the header array for the given request body we are working
      # with
      def add_body_type_headers
        return if @headers[Headers::CONTENT_LENGTH] || chunked?

        @request_header << "#{Headers::CONTENT_LENGTH}: #{@body.size}"
      end

In this case, I think the header Content-Length should only be included when you are sending a message-body which body size > 0.

This behavior trigger some bugs, for example:

require 'http'

res = HTTP.get 'https://registry.npmjs.org'
puts res.to_s

will cause an error:

[ztz@localhost tmp]$ ruby http.rb
No frontdoor hosts available

by using typhoeus:

require 'typhoeus'

res = Typhoeus.get 'https://registry.npmjs.org'
puts res.to_s

will output:

[ztz@localhost tmp]$ ruby typhoeus.rb 
{"db_name":"registry","doc_count":922224,"doc_del_count":342,"update_seq":11768451,"purge_seq":0,"compact_running":false,"disk_size":8918630986,"data_size":6431280597,"instance_start_time":"1528907394828029","disk_format_version":6,"committed_update_seq":11768449}

You can test by using commands below:

curl -I -H "Content-Length: 0" https://registry.npmjs.org
curl -I https://registry.npmjs.org

What you expected to happen:

GET request without Content-Length: 0

GET / HTTP/1.1
Connection: close
Host: localhost:8080
User-Agent: http.rb/3.3.0

What actually happened:

GET request with Content-Length: 0

GET / HTTP/1.1
Connection: close
Host: localhost:8080
User-Agent: http.rb/3.3.0
Content-Length: 0

Version of gem or commit ref you are using: http 3.3.0

Version of ruby you are using: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]

tarcieri commented 6 years ago

Per the HTTP/1.1 RFC including a Content-Length: 0 is perfectly valid and recommended (SHOULD):

https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Applications SHOULD use this field to indicate the transfer-length of the message-body, unless this is prohibited by the rules in section 4.4.

Any Content-Length greater than or equal to zero is a valid value.

The only exceptions are given in section 4.4, namely for Transfer-Encoding: chunked. See:

https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4

If a Content-Length header field (section 14.13) is present, its decimal value in OCTETs represents both the entity-length and the transfer-length. The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present).

If it's actually causing real-world interop problems we can discuss not sending it, but I just want to be clear that this is perfectly valid (and recommended) behavior per the RFC.

I would suggest additionally filing an issue against NPM and seeing what they say. If this is causing breakages it would seem they are not properly honoring the RFC.

ixti commented 6 years ago

I insist it's an error on NPM side. In fact quick digging shows that it's a problem of Cloudflare:

ap HTTP.get("https://registry.npmjs.org").headers 
{
              "Date" => "Tue, 26 Jun 2018 16:50:22 GMT",
      "Content-Type" => "text/plain;charset=UTF-8",
    "Content-Length" => "28",
        "Connection" => "close",
        "Set-Cookie" => "__cfduid=de333c475cc2be94adc11907473ba69c21530031809; expires=Wed, 26-Jun-19 16:50:09 GMT; path=/; domain=.registry.npmjs.org; HttpOnly",
         "Expect-Ct" => "max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\"",
            "Server" => "cloudflare",
            "Cf-Ray" => "43112bd8acaf2f6b-MAD"
}
ixti commented 6 years ago

In fact it seems like their upstream server is not correctly handling requests from cloudflare. Or cloudflare corrupts request somehow.

zt2 commented 6 years ago

@tarcieri @ixti thx for the answer, is it possible for user to enable/disable this feature? maybe some users want to fully control the data they send

zanker commented 6 years ago

If Cloudflare is having issues handling Content-Length: 0, we should consider removing it I think. Even if the spec says it's ok, I'd consider it not working on Cloudflare to be a decent indicator it'll break on things done by crazier companies.

On Tue, Jun 26, 2018 at 10:36 AM, ztz notifications@github.com wrote:

@tarcieri https://github.com/tarcieri @ixti https://github.com/ixti thx for the answer, is it possible for user to enable/disable this feature? maybe some users want to fully control the data they send

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/httprb/http/issues/487#issuecomment-400400563, or mute the thread https://github.com/notifications/unsubscribe-auth/AATvvQAIaLFh690IiVF-UOMdysMbRR9Qks5uAnEVgaJpZM4U36uG .

ixti commented 6 years ago

Probably we should:

content_length = @body.size
return unless 0 < content_length
@request_header << "Content-Length: #{content_length}"
zt2 commented 6 years ago

@ixti @zanker after test other hosts behind cloudflare, seems like registry.npmjs.org is the only host affected this bug.

zanker commented 6 years ago

I assume the Registry is implemented in Node? Maybe Node has an issue with 0 Content-Length's. Not quite as big as Cloudflare, but probably still enough to warrant a fix?

tarcieri commented 6 years ago

It would be good to narrow down the exact nature of the problem before we deviate from RFC2616's recommendations

ixti commented 6 years ago

The problem is that regisrty.npmjs.org server code seesm to be closed source. At least I failed to find it's sources.

tarcieri commented 6 years ago

@zanker I think the NPM registry servers are actually written in Rust...

zanker commented 6 years ago

😂

Sent from my iPhone

On Jun 27, 2018, at 06:47, Tony Arcieri notifications@github.com wrote:

@zanker I think the NPM registry servers are actually written in Rust...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

fcheung commented 3 years ago

Hi, just to dig up an old one - rfc 7230, which obsoletes 2616, seems to shift on this, saying

https://tools.ietf.org/html/rfc7230#section-3.3.2

A user agent SHOULD NOT send a Content-Length header field when the request message does not contain a payload body and the method semantics do not anticipate such a body

which I think you could argue would seem to apply to GET requests.

In terms of real world issues, this trips up AWS's desync mitigation mode, when set to its strictest mode https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html#classification-reasons. (in other modes this is tracked / logged but with no other effects)

bdw429s commented 3 years ago

Please reconsider this. I'm having major issues with a Cloudflare Proxy using the Node Fetch library which is rejecting all requests from my Discourse software, which I believe uses this Ruby gem to match HTTP Get requests for link previews.

The developers of the Node Fetch library staunchly refuse to consider allowing a content-length header with HTTP GET requests and claim it's not strictly part of the spec, and therefore forbidden. I honestly don't care, but until you guys reach the same conclusion they have (or vice versa) I'm screwed :)

https://github.com/whatwg/fetch/issues/551 https://github.com/whatwg/fetch/issues/83

More details here: https://meta.discourse.org/t/link-preview-http-get-breaks-spec/200996

tarcieri commented 3 years ago

If someone opens a PR, we can consider merging it.

The developers of the Node Fetch library staunchly refuse to consider allowing a content-length header with HTTP GET requests and claim it's not strictly part of the spec, and therefore forbidden.

Citation please? You link to a couple of very long threads with a lot of back and forth, and based on a cursory perusal I'm not seeing any specs cited.

The closest thing I've seen is @fcheung's SHOULD NOT from RFC 7230. That would seem to indicate the current behavior is not optimal, but a SHOULD NOT does not violate the spec.

bdw429s commented 3 years ago

@tarcieri I would encourage you to engage the Node Fetch developers directly. They're the ones who disagree and have closed all the tickets entered for them to consider it. If you're asking for a citation of their staunch refusal, just read the threads. It's all over. If you want a citation of their reasoning, it's in there too, but again I don't have a dog in this race and frankly don't care what the spec does or doesn't say. I just want my 3rd party software developers to get along :) I'm not going to make a case for either one here, I'm just asking that everyone actually come to an agreement. When then Fetch devs are 100% certain they're right and close their tickets and you're 100% your right and you close your tickets, I'm the one stuck with non-functioning software, lol

tarcieri commented 3 years ago

I don't have time to do that, sorry

I don't have a dog in this race and frankly don't care what the spec does or doesn't say

The specs are authoritative, which is why I do care what they say.

bdw429s commented 3 years ago

The specs are authoritative, which is why I do care what they say.

Right, the issue here is that you claim they say one thing while the Fetch devs claim they say the opposite and they use the same "we're just following the spec" excuse.

Perhaps a better way to word what I said above wouldn't be "I don't care" but rather, "stop making this MY issue to resolve". There is an obvious incompatibility here with your code and the Fetch code and I'm not the person to sort it. You and the Fetch developers are. I'm just stuck in the middle and each of you arguing your points in your respective echo chambers isn't getting anyone anywhere. I'm not going to argue your points for you on the Fetch forum and I'm not going to argue the Fetch dev's points for them here. But I do need a resolution.

tarcieri commented 3 years ago

"stop making this MY issue to resolve".

I'm a volunteer donating my time to maintaining an OSS project. My time is limited and I have many other projects to maintain. I don't have time or interest to hop onto other projects forums and debate their developers.

You are suggesting this project is violating the HTTP spec. If so, show me. If not, based on the best available resources I have seen this far, we are not violating the spec.

PRs to change the behavior may be accepted, as it seems based on RFC 7230 we are not following best practices, however that does not amount to a violation of the spec.

bdw429s commented 3 years ago

Correcting my link above-- this is a better link to the actual fetch API repo where they argue that httprb is not following the spec:

https://github.com/node-fetch/node-fetch/pull/795

I'm a volunteer donating my time to maintaining an OSS project. My time is limited and I have many other projects to maintain. I don't have time or interest to hop onto other projects forums and debate their developers.

I understand that, but your software is also breaking other software I'm trying to use. Why you would continue to dig your heels in and refuse to even consider changing your behavior is beyond me. Isn't it at least worth re-opening the ticket?

You are suggesting this project is violating the HTTP spec

Not me. The Node Fetch API developers, the talking heads at https://whatwg.org/ and https://httpwg.org/ and every major browser suggest your project is breaking the spec. But even if you technically weren't by the letter of the law, wouldn't it still concern you that your libraries non-confirming behavior was causing issues? There is only one other HTTP client I've been able to find that agrees with your interpretation of the spec, and that makes you an outlier. And a technically-correct outlier claiming everyone else is wrong is still just as effectively broken.

PRs to change the behavior may be accepted

As an open source maintainer myself, I do appreciate the offer. I'm not a Ruby developer and I don't have any plans to start so I'm afraid I'm not the best person to try submitting code to any Ruby project. Which is why I also bugged the people over at Discourse who use your library as maybe they'll be able to lend a hand here.

tarcieri commented 3 years ago

Why you would continue to dig your heels in and refuse to even consider changing your behavior is beyond me.

What? My very first reply was "If someone opens a PR, we can consider merging it."

You even quote me saying the same thing. Why would you even say this?

Again, please open a PR if you want a change, or at the very least, open an issue with a relevant HTTP spec violation if you are able to find one.