leahneukirchen / mblaze

Unix utilities to deal with Maildir
Other
447 stars 48 forks source link

Confusing mshow behaviour with multipart/form-data #212

Closed Earnestly closed 3 years ago

Earnestly commented 3 years ago

Version: https://github.com/leahneukirchen/mblaze/commit/4ccf2f08c1aa8b15f31ac469edebe6c4710d74f1

For full disclosure I'm attempting to reuse mshow for POST data instead of email so perhaps this expected but I was hoping you may have some insight into this behaviour I'm witnessing.

Here is an example file I am working with, it included CRLFs (I'm not sure if github strips them):

Content-Type: multipart/form-data; boundary=------------------------55a586f81559face

--------------------------55a586f81559face
Content-Disposition: form-data; name="a"; filename="foo"
Content-Type: application/octet-stream

foo

--------------------------55a586f81559face
Content-Disposition: form-data; name="a"; filename="bar"
Content-Type: application/octet-stream

bar

--------------------------55a586f81559face--

When in this form it appears to work as I would expect:

% mshow -t - < example
/dev/stdin
  1: multipart/form-data size=346
    2: application/octet-stream size=4 name="foo"
    3: application/octet-stream size=4 name="bar"

However if any of the content contains a newline it appears to throw mshow off.

Content-Type: multipart/form-data; boundary=------------------------55a586f81559face

--------------------------55a586f81559face
Content-Disposition: form-data; name="a"; filename="foo"
Content-Type: application/octet-stream

foo

baz ^ newline

--------------------------55a586f81559face
Content-Disposition: form-data; name="a"; filename="bar"
Content-Type: application/octet-stream

bar

--------------------------55a586f81559face--
% mshow -t - < example
/dev/stdin
  1: multipart/form-data size=212 name="foo"
    2: application/octet-stream size=4 name="bar"

% mshow -x - < example
foo
bar

% head -n-0 foo bar
==> foo <==
baz ^ newline

--------------------------55a586f81559face
Content-Disposition: form-data; name="a"; filename="bar"
Content-Type: application/octet-stream

bar

--------------------------55a586f81559face--

==> bar <==
bar

If I include a newline in bar then mshow would produce an empty file.

Am I doing anything wrong or is mshow not an appropriate for POST multipart/form-data?

Edit: Quick attempt with python's email module gets me this:

>>> import email
>>> f = open('example')
>>> m = email.message_from_file(f)
>>> for p in m.get_payload(): print(p)
...
Content-Disposition: form-data; name="a"; filename="foo"
Content-Type: application/octet-stream

foo

baz ^ newline

Content-Disposition: form-data; name="a"; filename="bar"
Content-Type: application/octet-stream

bar

>>> for p in m.get_payload(): print(p.get_payload(decode=True))
... 
b'foo\n\nbaz ^ newline\n'
b'bar\n\n'
leahneukirchen commented 3 years ago

Please upload the payload somewhere binary-safe (e.g. an attachment here), I can't reproduce from your description.

leahneukirchen commented 3 years ago

Or are you getting fooled by your tty where \r causes carriage return?

Earnestly commented 3 years ago

I believe 0x0 should keep the content intact: https://0x0.st/-t59.txt

The paste should be of the working example, to replicate the issue you'll only have to add a newline above or below the existing content.

I don't think I'm being fooled although it is possible; I do try to check with either cat -A or sed -n l if in doubt (or throw od at it).

Oh, I forget that github can do attachments. Perhaps redundant but it may last longer in the end: foo.txt

leahneukirchen commented 3 years ago

Those are the cases that work fine, tho?

leahneukirchen commented 3 years ago

Ok, I managed to edit it to trigger the bug.

leahneukirchen commented 3 years ago

Ok, a \n\n triggers the header end in blaze822_mem mistakenly. Let me think of a solution.

leahneukirchen commented 3 years ago

Please test.

Earnestly commented 3 years ago

With that PR it appears to work with mshow -x. I get all files made with the correct content, however mshow -O appears to always print out the headers which happens to include the content, before ultimately printing the content, e.g.

Content-Type: multipart/form-data; boundary=----WebKitFormBoundary0CUPAVuuy9FgOBzU
Content-Length: 356

------WebKitFormBoundary0CUPAVuuy9FgOBzU
Content-Disposition: form-data; name="post"; filename="bar"
Content-Type: application/octet-stream

bar

------WebKitFormBoundary0CUPAVuuy9FgOBzU
Content-Disposition: form-data; name="post"; filename="foo"
Content-Type: application/octet-stream

foo

newline

------WebKitFormBoundary0CUPAVuuy9FgOBzU--

Produces:

< out ~/mblaze/mshow -O -
------WebKitFormBoundary0CUPAVuuy9FgOBzU
Content-Disposition: form-data; name="post"; filename="bar"
Content-Type: application/octet-stream

bar

------WebKitFormBoundary0CUPAVuuy9FgOBzU
Content-Disposition: form-data; name="post"; filename="foo"
Content-Type: application/octet-stream

foo

newline

------WebKitFormBoundary0CUPAVuuy9FgOBzU--

bar
foo

newline

Is this intentional? (I tried various flags listed in the manpage but didn't come across anything to change this behaviour)

leahneukirchen commented 3 years ago

mshow will parse three parts: the multipart/form-data, and the two form parts. -O will print all of them, use mshow -O ./file 2 3 etc to only show the form parts.

Earnestly commented 3 years ago

Ah, so it does. This now all appears to work wonderfully. I'll close the issue as PR #213 does fix everything from the view of my tests.

Thank you