ptpb / pb

pb is a formerly-lightweight pastebin and url shortener
Other
549 stars 52 forks source link

curl >=7.56 form post uses chunked transfer encoding, which is not supported by upstream libraries #204

Closed Earnestly closed 6 years ago

Earnestly commented 6 years ago

I'm not sure if there's an issue open for this already (I couldn't find one) and I'm not entirely sure if this is an issue with pb or curl, but here is what I'm experiencing.

For a long time I've used curl -F 'c=<-' 'https://ptpb.pw/?u=1' and redirected anything I've uploaded using a pipeline or redirection, however lately when using redirection with images (png, jpeg, etc.) I'm getting nonsense.

Here's an example:

% xxd foo.png | head -n 1
00000000: 8950 4e47 0d0a 1a0a 0000 000d 4948 4452  .PNG........IHDR

% alias pb="curl -F 'c=<-' 'https://ptpb.pw/?u=1'"

% pb < foo.png
https://ptpb.pw/OIBa

% curl -s https://ptpb.pw/OIBa | xxd | head -n 1
00000000: efbf bd50 4e47 0d0a 1a0a 0000 000d 4948  ...PNG........IH

However, this works if I use curl -F 'c=@foo.png' 'https://ptpb.pw/?u=1'. Both <- and @- seem to cause the same corruption and I'm not sure where to look to find out why.

Thanks

glensc commented 6 years ago

looks like your downloaded response includes UTF8 BOM header.

buhman commented 6 years ago

Both <- and @- seem to cause the same corruption and I'm not sure where to look to find out why.

I can't fully reproduce this:

λ zack [~] → curl -F c=@- 'https://ptpb.pw?u=1' < foo.png
https://ptpb.pw/_LKw
λ zack [~] → sha1sum foo.png
a0a66c3e16d3a7fe69b8e8dff80a825ab1fcb2b0  foo.png
λ zack [~] → curl -s https://ptpb.pw/_LKw | sha1sum
a0a66c3e16d3a7fe69b8e8dff80a825ab1fcb2b0  -

However, when I do this:

$ curl -F 'c=<-' 'https://ptpb.pw?u=1' < foo.png       
https://ptpb.pw/i4Q4
$ curl -s https://ptpb.pw/i4Q4 | sha1sum
c42e3552023c1ca6c9cf0645123da060a28b8438  -

I do get a modified file.

I looked at what curl is doing with a setup like this:

$ nc -l 12345 > request.txt &
$ curl -F 'c=<-' http://localhost:12345 < foo.png
$ xxd request.txt

I found that the part boundary edge (including leading \r\n\r\n) begins with:

0d0a 0d0a 8950 4e47

While the file itself begins with:

8950 4e47

This is contrast to the response from pb:

efbf bd50 4e47

buhman commented 6 years ago

@Earnestly I've known since the beginning that werkzeug meddles with request/form data in ways that contradict pb's goals, but I'm not able to reproduce your @- claim. The difference is that @- adds a Content-Type: application/octet-stream header in the form-data part, while <- does not.

It appears that only the latter case causes this byte-mangling behavior.

Earnestly commented 6 years ago

That's odd because @- is definately doing something strange on my end:

% alias p="curl -F 'c=@-' 'https://ptpb.pw/?u=1'"
% sha1sum 1.png 
230e1e4f84af3ce4b8e4f6565e8a327ac6aa3bf4  1.png
% p < 1.png 
https://ptpb.pw/36mE
% curl -s https://ptpb.pw/36mE | sha1sum
3bb38383c9da3bf9c6987c95bbdc4fad36dfa984  -

With curl -v this is what I get (is the Content-Type wrong?):

% curl -vF 'c=@-' 'https://ptpb.pw/?u=1' < 1.png
*   Trying 138.197.7.107...
* TCP_NODELAY set
* Connected to ptpb.pw (138.197.7.107) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=ptpb.pw
*  start date: Aug 20 05:02:00 2017 GMT
*  expire date: Nov 18 05:02:00 2017 GMT
*  subjectAltName: host "ptpb.pw" matched cert's "ptpb.pw"
*  issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
*  SSL certificate verify ok.
> POST /?u=1 HTTP/1.1
> Host: ptpb.pw
> User-Agent: curl/7.56.0
> Accept: */*
> Transfer-Encoding: chunked
> Content-Type: multipart/form-data; boundary=------------------------f3ed7229b78f475b
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
< HTTP/1.1 200 OK
< Server: nginx/1.10.3 (Ubuntu)
< Date: Tue, 10 Oct 2017 21:30:26 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 21
< Connection: keep-alive
< Vary: Accept
< Location: https://ptpb.pw/36mE
< Access-Control-Allow-Origin: *
< X-Varnish: 3248659
< Age: 0
< Via: 1.1 varnish-v4
< Accept-Ranges: bytes
< 
https://ptpb.pw/36mE
* Connection #0 to host ptpb.pw left intact
buhman commented 6 years ago

With curl -v this is what I get (is the Content-Type wrong?):

That's not showing the request body, so I don't know.

curl 7.56.0

Hmm... Nice. I was using the old curl 7.55.1-2. I can reproduce this issue exactly as you described once upgrading to 7.56.0. The difference is indeed that curl starts omitting the C-T header from the multipart header.

buhman commented 6 years ago

Looks like this is maybe a regression caused by https://github.com/curl/curl/pull/1839 merged in 7.56

commits here implements a new MIME API that can be used to replace the old form API, The cli tool has been changed to use it for the -F option instead of CURLOPT_HTTPPOST.

buhman commented 6 years ago

curl -F 'c=@-' http://localhost:12345 < foo.png

POST / HTTP/1.1
Host: localhost:12345
User-Agent: curl/7.55.1
Accept: */*
Content-Length: 17371
Expect: 100-continue
Content-Type: multipart/form-data; boundary=------------------------79ebd724699c3f79

--------------------------79ebd724699c3f79
Content-Disposition: form-data; name="c"; filename="-"
Content-Type: application/octet-stream
POST / HTTP/1.1
Host: localhost:12345
User-Agent: curl/7.56.0
Accept: */*
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=------------------------36791b33cd7e514b
Expect: 100-continue

3ff4
--------------------------36791b33cd7e514b
Content-Disposition: form-data; name="c"
buhman commented 6 years ago

curl -F 'c=@-;type=application/octet-stream' http://localhost:12345 < foo.png

POST / HTTP/1.1
Host: localhost:12345
User-Agent: curl/7.56.0
Accept: */*
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=------------------------267cb0d3ee42336e
Expect: 100-continue

3ff4
--------------------------267cb0d3ee42336e
Content-Disposition: form-data; name="c"
Content-Type: application/octet-stream

curl -s $(curl -F 'c=@-;type=application/octet-stream' 'https://ptpb.pw?u=1' < foo.png) | xxd | head

00000000: efbf bd50 4e47 0d0a 1a0a 0000 000d 4948  ...PNG........IH
buhman commented 6 years ago

So this is all actually caused by curl 7.56 / https://github.com/curl/curl/pull/1839 https://github.com/curl/curl/pull/1839/commits/e6ececb43f909b6f9465e40b12b6d77b59a5e01a switching to Transfer-Encoding: chunked, and wsgi/werkzeug's apparently broken handling of it.

Related: https://gist.github.com/mitsuhiko/5721547

buhman commented 6 years ago

https://github.com/pallets/werkzeug/issues/1149

If I try this in a development environment, I get:

$ curl -F 'c=@-' 'http://localhost:10002' < foo.png
status: no post content

It's a miracle that the production deployment is giving you a paste response at all (some interaction between nginx and/or uwsgi).

buhman commented 6 years ago

http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_request_buffering

Default: proxy_request_buffering on;

http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_http_version

Default: proxy_http_version 1.0;

After doing some reading, I updated the production nginx configuration so that you now "correctly" get a 400 response when using chunked TE.

~As for client-side workarounds, the only thing I've managed to make work so far is to downgrade to curl 7.55.~ you can also just not use stdin, which generates non-chunked requests.

buhman commented 6 years ago

I was about to report this upstream, and it seems someone already has, for competitor ix.io: https://github.com/curl/curl/issues/1965

https://github.com/curl/curl/commit/14d6e207d35792e3e10a674b2e27cf2aa3370bf8

To read content from stdin instead of a file, use - as the filename. This goes for both @ and < constructs. Unfortunately it does not support reading the for both @ and < constructs. For this case, as well as for others in which the file from a named pipe or similar, as it needs the full size before the full data size cannot be determined before the transfer starts (as named pipes transfer starts or similar), data is transferred as chunks by HTTP and rejected by IMAP.

Confirmed, if you don't use stdin, curl generates a non-chunked request:

curl -F 'c=@foo.png' https://ptpb.pw/?u=1

As for actually fixing this, I need to drop/replace all of:

That's a complete rewrite.

Earnestly commented 6 years ago

That's quite the rabbit hole, nice work at uncovering the core issues. As you (and I) mentioned, I have been using files directly instead of stdin for the time being. Thanks for all your effort, hopefully curl can get this fixed soon.

buhman commented 6 years ago

Closing because curl reverted this behavior and fixing this is a rewrite anyway.

buhman commented 4 years ago

Fixed in https://github.com/ptpb/apocrypha/commit/d1a4c03b5f2e9f9b8d900e55dafa01972415c889