EOFError on multipart request

radar commented 1 year ago

I've been trying to track down recently why a particular request would cause a "400 Bad Request" HTTP status on Rack 2.2.7. The request comes from a 3rd party's code that I don't have access to, but I do have access to a curl script that they provided me with that generates a request in the same shape that their code does. Both result in the same 400 Bad Request response, or from Rack itself an EOFError.

The curl request is:

curl -XPOST -H "Host: localhost" -H "Accept: application/json" -u TEST:TEST -H "Content-Type: multipart/form-data; boundary=\"99d008a2-c9af-49f8-a1b2-aa3eec02f427\"" --data-binary "--99d008a2-c9af-49f8-a1b2-aa3eec02f427
Content-Type: text/plain; charset=utf-8
Content-Disposition: form-data; name=file; filename=batch.csv; filename*=utf-8''batch.csv

4000,AUD,redacted,,,,redacted,
--99d008a2-c9af-49f8-a1b2-aa3eec02f427--
" "http://localhost:9292/v1.0/batches/batch.csv"

And a simple Rack app that reproduces this issue:

run ->(env) { p Rack::Request.new(env).params; [200, {"Content-Type" => "text/html"}, ["Hello World!"]] }

My best guess here from playing around with the raw HTTP request itself is that the line endings on the multipart data are wrong, leading to a too-short Content-Type header.

I've got a Ruby script that manually constructs a valid request:

content_type = "text/csv"
filename = "batch.csv"
disposition = %Q(Content-Disposition: form-data; name=file; filename=#{filename}; filename*=utf-8''#{filename})

content = <<~CONTENT
--99d008a2-c9af-49f8-a1b2-aa3eec02f427
Content-Disposition: #{disposition}
Content-Type: #{content_type}

4000,AUD,redacted,,,,redacted,

--99d008a2-c9af-49f8-a1b2-aa3eec02f427--
CONTENT

content = content.gsub("\n", "\r\n")

request = <<~REQUEST
POST /v1.0/batches/any.csv HTTP/1.1
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: #{content.length}
User-Agent: HTTPie/3.2.2
Content-Type: multipart/form-data; boundary=99d008a2-c9af-49f8-a1b2-aa3eec02f427
Host: localhost:3000

REQUEST

request = request.gsub("\n", "\r\n")

request << content

File.write("bad-request", request)

Then I can pipe this through to Rack with: cat bad-request | nc localhost 9292

However, if I comment out:

content = content.gsub("\n", "\r\n")

And then run the Ruby script again, and pipe it to Rack, I will now start seeing these EOFError lines once again.

So a couple of questions:

Am I doing it wrong here?
Should Rack support the data-binary format as above, or am I doing illegal-according-to-the-spec things?
Is there a tool that I could use to validate raw HTTP requests are valid, so that I could've caught this sooner?

jeremyevans commented 1 year ago

This is expected behavior if you are using an incorrect line terminator (\n is incorrect, you must use \r\n, see RFC 1341 section 7.2.1). In rack 3, you get Rack::Multipart::EmptyContentError (a subclass of EOFError), to allow for more targeted rescuing.

Not related, and not an issue in your particular example, but when constructing the valid request, you should use content.bytesize instead of content.length, since the Content-Length should be in bytes and not characters.

I don't know of a tool to validate HTTP requests off the top of my head, but I'm guessing one exists.

I'll leave this open for a bit to allow for comments by another maintainer, in case I missed something.

ioquatix commented 3 months ago

I agree with @jeremyevans and I don't think we should do anything here. If we did decide to change the behaviour to be more lenient, we might end up causing more long term confusion. Perhaps as an alternative, we could introduce Rack::Multipart::Lint which is specifically aware of these issues and reports them.

Regarding specific answers:

Yes, I think you need to generate proper line endings "\r\n". Do I think it's overly verbose? Yes. But that's the standard.
I don't fully understand your quest, but binary data should be supported, but you will need to use proper encoding and line termination.
I'm not aware of any off the top of my head, that would validate multi-part bodies. But such a tool wouldn't be hard to create (multipart request bodies).

For further investigation/scope, I looked at other similar issues:

If you think introducing Rack::Mutipart::Lint is worth the effort, please feel free to open a PR.

rack / rack

EOFError on multipart request #2093