zigzap / zap

blazingly fast backends in zig
MIT License
1.99k stars 71 forks source link

SimpleRequest.body doesn't return the full body when the body has binary data #41

Closed edyu closed 9 months ago

edyu commented 9 months ago

When I need the raw HTTP body to parse multipart form data which can contain binary data such as image data, SimpleRequest.body would not contain the full body. For example, I have a request that has 2661 bytes (content-length is set by the browser), However, the SimpleRequest.body.len is only 157 bytes. This would only happen when an image file such as PNG is embedded. If a text file is embedded, it would work properly.

The following is the problem (with my own debugging code output but it illustrates the problem). Because the request cuts off the actual data, when I try to access the data in the request using the []const u8 body, it would complain about out-of-bounds error.

upload.image: length=2661
upload.image: content-type=multipart/form-data; boundary=----WebKitFormBoundaryC4LA6TmfoOku2M3F
upload.image: boundary=----WebKitFormBoundaryC4LA6TmfoOku2M3F
saving to public/img
body size is 157 == 2661?
body is { 45, 45, 45, 45, 45, 45, 87, 101, 98, 75, 105, 116, 70, 111, 114, 109, 66, 111, 117, 110, 100, 97, 114, 121, 67, 52, 76, 65, 54, 84, 
109, 102, 111, 79, 107, 117, 50, 77, 51, 70, 13, 10, 67, 111, 110, 116, 101, 110, 116, 45, 68, 105, 115, 112, 111, 115, 105, 116, 105, 111, 
110, 58, 32, 102, 111, 114, 109, 45, 100, 97, 116, 97, 59, 32, 110, 97, 109, 101, 61, 34, 115, 101, 114, 118, 101, 114, 95, 105, 109, 97, 103, 101, 34, 59, 32, 102, 105, 108, 101, 110, 97, 109, 101, 61, 34, 116, 101, 115, 116, 45, 115, 109, 97, 108, 108, 46, 112, 110, 103, 34, 13, 10, 67, 111, 110, 116, 101, 110, 116, 45, 84, 121, 112, 101, 58, 32, 105, 109, 97, 103, 101, 47, 112, 110, 103, 13, 10, 13, 10, 137, 80, 78, 71, 
13, 10, 26, 10 }
skipped: ------WebKitFormBoundaryC4LA6TmfoOku2M3F
name=server_image
filename=test-small.png
Content-Type=image/png
thread 644033 panic: index out of bounds: index 2659, len 157
/home/edyu/ws/zig/myprogram/src/server_endpoint.zig:172:52: 0x2e6b45 in parseMultiPart (myprogram)
                    if (!std.mem.eql(u8, stop, body[end + 2 .. length - 2])) {
                                                   ^
edyu commented 9 months ago

To explain the debug output. The Content-Length is 2661 bytes but SimpleRequest.body.len shows only 157 bytes. This problem won't show up if the embedded file is text. I have a feeling there is some kind of C -> Zig error.

edyu commented 9 months ago

I can verify that facil.io indeed has the full body in the request. The binary data is causing zero-delimited sentinel to fail. I now use body.ptr[0..length] instead of Simple.Request.body to properly parse body for multipart form. Of course I'm duplicating the work facil.io already does fio.http_parse_body(self.h) but I don't know how to retrieve the result of SimpleRequest.parseBody() for multipart form. Not sure how to get it from parameters because the value of the parameter is shown to be null when I use SimpleRequest.parseBody() and then getParametersToOwnedSlices().

The current code in Zap is assigning .body = util.fio2str(r.*.body), I believe the proper way is to get the length from "content-length" and then calling r.*.body[0..length] because there could be null (zero) in the body.

renerocksai commented 9 months ago

You're right. The current implementation using fio2str assuming text-data. Mime-encoded 'binary' data would be handled by that. But real binary data cannot. This is a flaw and requires a bit of thought to get it fixed. We may be lucky and your proposed fix will work without side-effects. Then, at least you should be able to use the SimpleRequest.body. Using the length from 'content-length' may be problematic as it would open zap up to potential segfault / panic attacks by pretending to send n bytes in content-length but actually sending less bytes -> zap tries to access random memory. So, the body length must be somewhere in facilio - we should use that IMHO

renerocksai commented 9 months ago

OK, I fixed the SimpleRequest.body (in the bindata branch FYI). Alas, when trying to access the binary parameter, facilio seems to treat the value as a hashmap which is not supported by zap. I'm investigating...

[EDIT]: OK, seems non-trivial. When I receive a binary file via curl in a multipart form, facilio creates a hashmap containing filename, mimetype, and data. The data itself seems to "must be" read() from, apparently. I'll try to wrap this so all you get are u8 slices.

renerocksai commented 9 months ago

Dammit. When I try to read from the data, it doesn't work, and pretends to be 0 bytes long. Check out the bindata branch if you like to check what I might have f-ed up.

edyu commented 9 months ago

@renerocksai thanks for keeping at it. I'll take a look. I agree with not being able to trust "Content-Length" for attacks but there really is no other way unless facil.io can count the number of bytes coming in the pipe directly. I see a big portion of the code in facil.io parsing the form-data but I just did my own parsing. One stop gap solution would be checking the content-type and only do the body parsing and "trusting" content-length if we get the special binary form-data content-type. The problem is what does the server do by default if it receives the data? One way is to register a callback so that the callback is called for each binary file parsed and the callback can do whatever (like saving the file which is what I do right now.

renerocksai commented 9 months ago

See https://github.com/zigzap/zap/blob/master/examples/bindataformpost/bindataformpost.zig

There, I iterate over the parametersToOwnedList and check the tag of the params. If it's a binary file, I print it. You could add all other types or an else branch there according to your needs. It might be a bigger pain in the a.. having to check them all but in the else branch you know it's not a file so you could getParamStr(paramname) and retrieve the string version as usual

renerocksai commented 9 months ago

in fact, I just updated the example to showcase this usecase