non UTF-8 in Cookies - Githubissues

vibe-d / vibe.d

Official vibe.d development

MIT License

1.15k stars 284 forks source link

non UTF-8 in Cookies #1192

Open yannick opened 9 years ago

yannick commented 9 years ago

i dump my whole cookies map into my msgpack log file. a few clients seem to send non utf-8 values and these then mess up my msgpack packets.

i guess using urlDecode in parseCookies generates these. or the headers need to be checked earlier

yannick commented 9 years ago

not 100% sure if this is a bug but what i tried so far is sending a weird request like to the http_info example and the result is a broken template output.

    requestHTTP("http://127.0.0.1:8080/",
        (scope req) {
      req.headers["Cookie"] = "all=true" ~ cast(string) [192, 175, 224, 128, 175, 240, 128, 128, 175, 248, 128, 128, 128, 175, 252, 128, 128, 128, 128, 175];
        },
        (scope res) {
      logInfo(res.bodyReader.readAllUTF8());
            logInfo("Response: %d", res.statusCode);
            foreach (k, v; res.headers)
                logInfo("Header: %s: %s", k, v);

        }
    );

maybe headers overall should run through sanitization?

i do not understand much about utf8, so i just copied some reque sequences from http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

etcimon commented 9 years ago

I think the binary cookie would have to be base64'd usually.

http://stackoverflow.com/questions/4400678/http-header-should-use-what-character-encoding

Only solution I can think of (for non-conforming clients) is to fail at the headers parser if the character is invalid. It could likely be due to an attack unless some piece of software has a bug in it to register invalid cookies in the first place

yannick commented 9 years ago

i started to investigate this whole thing when the msgpack decoder failed for certain logframes i saved. turned out that there where cookies containing a date with the string "Mitteleuropäische Sommerzeit". its probably a failed proxy or some plugin.

from what i understand from https://tools.ietf.org/html/rfc7230#section-3.2.4 all the headers should be parsed as US-ASCII or better as ISO-8859-1, but then what happens with urls that contain utf-8 characters?

yannick commented 9 years ago

problem seems to be, that these days most servers accept utf8 requests and people even started to use utf8 in get url's. possible way out: call sanitize, then if there is an exception try to convert from ISO-8859-1 to UTF8 and only if that fails go out of business for that request. i try to make a PR that makes above strategy an optional setting.