ptpb / pb

pb is a formerly-lightweight pastebin and url shortener
Other
553 stars 52 forks source link

Using form creates "DOS" file with CRLF ended lines #203

Closed simmel closed 7 years ago

simmel commented 7 years ago

Heyo!

When using the form to paste anything (I've tried selecting in Firefox, Terminal) and pasting that makes the paste a "DOS" document, i.e. it has CRLF line endings.

Example:b

$ curl https://ptpb.pw/pHQI | head -n1 | hexdump -C 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   152  100   152    0     0    214      0 --:--:-- --:--:-- --:--:--   214
00000000  70 62 20 69 73 20 61 20  6c 69 67 68 74 77 65 69  |pb is a lightwei|
00000010  67 68 74 20 70 61 73 74  65 62 69 6e 20 61 6e 64  |ght pastebin and|
00000020  20 75 72 6c 20 73 68 6f  72 74 65 6e 65 72 20 62  | url shortener b|
00000030  75 69 6c 74 20 75 73 69  6e 67 20 66 6c 61 73 6b  |uilt using flask|
00000040  2e 0d 0a                                          |...|
00000043

0x0d is CR and 0x0a is LF, see http://www.asciitable.com/

Note that when pasting via curl it does NOT create CRLF ended lines but LF.

buhman commented 7 years ago

I can reproduce this, but it seems to be a browser feature:

https://tools.ietf.org/html/rfc1867

As with all MIME transmissions, CRLF is used as the separator for lines in a POST of the data in multipart/form-data.

https://tools.ietf.org/html/rfc2388 https://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.2 https://stackoverflow.com/questions/6963480/firefox-and-chrome-replacing-lf-with-crlf-during-post

buhman commented 7 years ago

This can't be fixed by pb, because the data is mangled by the client/browser before being sent, and pb only faithfully stores the paste byte-for-byte with no other interpretation or transformation.

buhman commented 7 years ago

This is actually a proper standard specific to HTML forms, so not even a bug either:

https://www.w3.org/TR/html5/forms.html#the-textarea-element

The element's value is defined to be the element's raw value with the following transformation applied:

Replace every occurrence of a "CR" (U+000D) character not followed by a "LF" (U+000A) character, and every occurrence of a "LF" (U+000A) character not preceded by a "CR" (U+000D) character, by a two-character string consisting of a U+000D CARRIAGE RETURN "CRLF" (U+000A) character pair.

In fact, if anything, curl's behavior is arguably not correct by not applying this transformation to form fields.