owasp-modsecurity / ModSecurity

ModSecurity is an open source, cross platform web application firewall (WAF) engine for Apache, IIS and Nginx. It has a robust event-based programming language which provides protection from a range of attacks against web applications and allows for HTTP traffic monitoring, logging and real-time analysis.
https://www.modsecurity.org
Apache License 2.0
8.06k stars 1.58k forks source link

Discussion about 'hostname' field in log #3200

Open airween opened 1 month ago

airween commented 1 month ago

Describe the bug

Libmodsecurity3 produces log (through a callback function eg. for Nginx) with unusable [hostname] field. [hostname] always contains the IP address of the server, which has no informational value.

Logs and dumps

An example:

ModSecurity: Warning. detected SQLi using libinjection. [file "/usr/share/modsecurity-crs/rules/REQUEST-942-APPLICATION-ATTACK-SQLI.conf"] [line "46"] [id "942100"] ... [tag "PCI/6.5.2"] [hostname "18.19.20.21"] [uri "/xmlrpc.php"]... client: 91.92.93.94, server: www.myserver.com, request: "POST /xmlrpc.php HTTP/1.1", host: "www.myserver.com"

As you can see, the log contains the server and host fields at the end of the line, but unfortunately the length of the line is limited (it's hard coded in Nginx to 2048 bytes), and if the request is too long (eg. there is a GET request with some very long query string) these fields will be disappeared.

To Reproduce

Send any invalid request that triggers a rule and produces a log entry.

Expected behavior

As in case of Apache's mod_security2 module, it would be fine to get the hostname field with the correct value.

Server (please complete the following information):

Additional context

There is a previous intention to fix this issue:

https://github.com/owasp-modsecurity/ModSecurity/pull/2906

but it was rejected (see the discussion). The first point was this:

The 'Host' header is untrusted user input. - which is a bit interesting, because (meanwhile I reviewed the code again) I found that the variable SERVER_NAME (reference) is also produced from the Host header - see the source - without any filtering.

Never mind, I would like to discuss how can we solve this issue.

My suggestions:

I also created a patch for Nginx connector, here is the result:

curl -v -H "Host: modsecurity.org" http://localhost/?q=/bin/bash

the log:

ModSecurity: Warning. Matched "Operator `ValidateByteRange' ... [tag "capec/1000/210/272"] [hostname "modsecurity.org"] [uri "/"] ... client: ::1, server: _, request: "GET /?q=/bin/bash HTTP/1.1", host: "modsecurity.org"

Note, that the connector uses Nginx's r->headers_in.server variable, see Nginx's source.

Please share your idea about this behavior and possible solutions.

theseion commented 1 month ago

So with your change, the hostname log entry would become usable, but might still be cut off. That will have to be solved later, correct?

Do I understand correctly, that using a directive like server_name would now set the server name for the log?

dune73 commented 1 month ago

I think the patch would be fine. As for the alternative approach with the application, would that be the connector that calls it?

airween commented 1 month ago

So with your change, the hostname log entry would become usable, but might still be cut off. That will have to be solved later, correct?

yes, exactly. Later, and not in the library, but in the application that uses library (namely Nginx connector).

But we have to be careful: if we add this method to Nginx connector, that won't work with older versions, so we have to check that function exists or not. The best was to control this we check the ModSecurity version, like it's used in other case, see this example.

This is why would it be good to add this feature as soon - we can add this feature to the connector after release.

Do I understand correctly, that using a directive like server_name would now set the server name for the log?

Yes, meanwhile I realized that too.

Thanks.

airween commented 1 month ago

I think the patch would be fine. As for the alternative approach with the application, would that be the connector that calls it?

Exactly. With a version control - see my previous comment.

airween commented 1 month ago

Do I understand correctly, that using a directive like server_name would now set the server name for the log?

I was still thinking about this question and made some tests.

First: I wouldn't touch the field names of the log, I would keep the existing fields, namely [hostname]. The reason is simple: the more similar the formats of the logs are to each other (v2/v3), the easier it is to notice the differences based on the logs. And if someone has a log storing/processing system, then more easier to unify the log processing.

Take a look to the Nginx's server log:

[hostname "www.modsecurity.org"] ... [ref ""], client: ::1, server: www.modsecurity.org, request: "GET /?q=/bin/bash HTTP/1.1", host: "modsecurity.org"

I have a vhost in my Nginx with 2 server names:

server_name www.modsecurity.org modsecurity.org;

If I send a request with hostname modsecurity.org, then I get the log entry like above: the server is with the first server_name entry, the host is what I used. (I modified the connector code, now I tried to use module context's server_name variable.

Which one is the better? Apache uses the Host header value, just fyi. I would prefer that one too.

(I know that it is not necessary to decide now and not here - just finished a test and shared my experience.)

theseion commented 1 month ago

I assume, that if you used default_server, nginx will fall back to the Host header as well. To me, knowing which virtual server processed the request is more useful than the contents of the Host header.

dune73 commented 1 month ago

I second @theseion, but anything is fine as long it's not an IP address. :)

I mean in most cases, host header and servername should be in sync. So it does not matter too much (obviously, the source needs to be well documented for the remainder of the cases).

airween commented 1 month ago

I assume, that if you used default_server, nginx will fall back to the Host header as well. To me, knowing which virtual server processed the request is more useful than the contents of the Host header.

No. I already tried that, but in case of default server, the [hostname] field contained the _ character - which is not usable. I suggest we should use r->headers_in.server, which is more usable.

airween commented 1 month ago

Based on this discussion and on Slack's comments, I'm going to prepare the PR which implements a new API function that can be used to set the correct hostname. Later in application site we can decide what value do we want to set.