ModSecurity is an open source, cross platform web application firewall (WAF) engine for Apache, IIS and Nginx. It has a robust event-based programming language which provides protection from a range of attacks against web applications and allows for HTTP traffic monitoring, logging and real-time analysis.
In utf8_unicode_inplace_ex(), we have the following code:
c = *utf;
/* If first byte begins with binary 0 it is single byte encoding */
if ((c & 0x80) == 0) {
/* single byte unicode (7 bit ASCII equivilent) has no validation */
count++;
if (count <= len) {
if (c == 0) *data = x2c(&c);
else *data++ = c;
}
}
The code if (c == 0) *data = x2c(&c); is wrong.
utf the input string.
If input is "x", we copy the character => correct.
If input is "\x00\xA0", we call the x2c function which expects an hexadecimal string as input (like "20" for a space).
Furthermore, the output buffer (data) is not increased, so the character is overwritten on the next loop.
The correct code is
if (c == 0) {
sprintf(data, "%%u%04x", utf[1]);
count += 4;
data += 6;
i++;
*changed = 1;
}
Remarks:
Using "%04x" replaces a lot of useless code used in other parts of the code
I didn't add a check if (count <= len) after count += 4; because it cannot overflow if we fix the code: len = input_len * 6 + 1; instead of len = input_len * 4 + 1; (4 bytes -> %u1234). The check (and the variable count) could thus be removed everywhere
After filling data, we have several checks (/ invalid UTF-8 character number range (RFC 3629) /, / check for overlong /, ...) with this code count++; *data++ = c; What's the point? This copies the character after the ones we already copied. This looks wrong to me. Any reason to keep this code?
In utf8_unicode_inplace_ex(), we have the following code:
The code
if (c == 0) *data = x2c(&c);
is wrong. utf the input string. If input is "x", we copy the character => correct. If input is "\x00\xA0", we call the x2c function which expects an hexadecimal string as input (like "20" for a space). Furthermore, the output buffer (data) is not increased, so the character is overwritten on the next loop. The correct code isRemarks:
if (count <= len)
aftercount += 4;
because it cannot overflow if we fix the code:len = input_len * 6 + 1;
instead oflen = input_len * 4 + 1;
(4 bytes -> %u1234). The check (and the variable count) could thus be removed everywherecount++; *data++ = c;
What's the point? This copies the character after the ones we already copied. This looks wrong to me. Any reason to keep this code?