owasp-modsecurity / ModSecurity

ModSecurity is an open source, cross platform web application firewall (WAF) engine for Apache, IIS and Nginx. It has a robust event-based programming language which provides protection from a range of attacks against web applications and allows for HTTP traffic monitoring, logging and real-time analysis.
https://www.modsecurity.org
Apache License 2.0
7.67k stars 1.54k forks source link

Incorrect utf8toUnicode transformation for 00xx #3129

Open marcstern opened 3 weeks ago

marcstern commented 3 weeks ago

In utf8_unicode_inplace_ex(), we have the following code:

c = *utf;
/* If first byte begins with binary 0 it is single byte encoding */
if ((c & 0x80) == 0) {
    /* single byte unicode (7 bit ASCII equivilent) has no validation */
    count++;
    if (count <= len) {
        if (c == 0) *data = x2c(&c);
        else *data++ = c;
    }
}

The code if (c == 0) *data = x2c(&c); is wrong. utf the input string. If input is "x", we copy the character => correct. If input is "\x00\xA0", we call the x2c function which expects an hexadecimal string as input (like "20" for a space). Furthermore, the output buffer (data) is not increased, so the character is overwritten on the next loop. The correct code is

if (c == 0) {
    sprintf(data, "%%u%04x", utf[1]);
    count += 4;
    data += 6;
    i++;
    *changed = 1;
}

Remarks:

  1. Using "%04x" replaces a lot of useless code used in other parts of the code
  2. I didn't add a check if (count <= len) after count += 4; because it cannot overflow if we fix the code: len = input_len * 6 + 1; instead of len = input_len * 4 + 1; (4 bytes -> %u1234). The check (and the variable count) could thus be removed everywhere
  3. After filling data, we have several checks (/ invalid UTF-8 character number range (RFC 3629) /, / check for overlong /, ...) with this code count++; *data++ = c; What's the point? This copies the character after the ones we already copied. This looks wrong to me. Any reason to keep this code?