REQUEST_BODY is set when it should not according to documentation

mirkodziadzka-avi commented 5 years ago

I think there is a mismatch between modsec-3 implementation and modsec documentation.

According to https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual-%28v2.x%29#request_body the REQUEST_BODY variable should be set only under 2 conditions.

If the WWWFormURLEncoded body processor is used
If no body processor is used and ctl:forceRequestBodyVariable is used in phase:1

In current v3 code, this variables is always set.

It is also expected to be set in regression tests, for example test/test-cases/regression/variable-REQUEST_BODY.json. This test is using the Multipart processor (set from C code and not from modsec language)

However, the regression tests of the OWASP Modsec CoreRuleset team (https://github.com/SpiderLabs/owasp-modsecurity-crs/tree/v3.2/dev/util/regression-tests/tests) expect that REQUEST_BODY is not set in some cases. For example in regression test https://github.com/SpiderLabs/owasp-modsecurity-crs/blob/v3.2/dev/util/regression-tests/tests/REQUEST-944-APPLICATION-ATTACK-JAVA/944120.yaml#L128 against the rule

# Magic bytes detected and payload included possibly RCE vulnerable classess detected and process execution methods detected
# anomaly score set to critical as all conditions indicate the request try to perform RCE.
SecRule ARGS|ARGS_NAMES|REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|REQUEST_BODY|REQUEST_HEADERS|XML:/*|XML://@* \
    "@rx (?:clonetransformer|forclosure|instantiatefactory|instantiatetransformer|invokertransformer|prototypeclonefactory|prototypeserializationfactory|whileclosure|getproperty|filewriter|xmldecoder)" \
    "id:944120,\
    phase:2,\
    block,\
    t:none,t:lowercase,\
    log,\
    msg:'Remote Command Execution: Java serialization (CVE-2015-5842)',\
    logdata:'Matched Data: %{MATCHED_VAR} found within %{MATCHED_VAR_NAME}',\
    tag:'application-multi',\
    tag:'language-java',\
    tag:'platform-multi',\
    tag:'attack-rce',\
    tag:'OWASP_CRS/WEB_ATTACK/COMMAND_INJECTION',\
    tag:'WASCTC/WASC-31',\
    tag:'OWASP_TOP_10/A1',\
    tag:'PCI/6.5.2',\
    tag:'paranoia-level/1',\
    ver:'OWASP_CRS/3.1.0',\
    severity:'CRITICAL',\
    chain"
    SecRule MATCHED_VARS "@rx (?:runtime|processbuilder)" \
        "t:none,t:lowercase,\
        setvar:'tx.msg=%{rule.msg}',\
        setvar:'tx.rce_score=+%{tx.critical_anomaly_score}',\
        setvar:'tx.anomaly_score_pl1=+%{tx.critical_anomaly_score}',\
        setvar:'tx.%{rule.id}-OWASP_CRS/WEB_ATTACK/RCE-%{MATCHED_VAR_NAME}=%{MATCHED_VAR}'"

the test is expecting that the rule is not matching. Which will only happen if REQUEST_BODY is not set.

So I can either make the Modsec 3 regression tests happy or the Modsec-CoreRuleset regression tests.

Is there a plan to change the behaviour or the implementation?

Thanks

airween commented 5 years ago

Just fyi, this is a know issue, and a(n ugly) PR available - but I only made it for to discuss: https://github.com/SpiderLabs/ModSecurity/pull/2045

Also note, that there is a "feature request" for v2 - by this bug, the v3 can catch the XXE attack, v2 can't. :) https://github.com/SpiderLabs/owasp-modsecurity-crs/issues/1319

zimmerle commented 5 years ago

Hi @michaelgranzow-avi,

In version two those configuration use to be very confusing, sometimes leading the user to access a variable that wasn't even set. Fooling the results of the inspection. As of 3.1, the idea is that the variable will be filled according to its usage, so, if there is one rule that uses the REQUEST_BODY the functionality will be on, and the variable will be filled. The opposite is also true: if no variable is using the REQUEST_BODY, the feature will be disabled, therefore saving CPU cycles.

It seems that we are in the middle ground, and what you have reported seems to be an issue. Likely to be addressed in the format exposed above.

mirkodziadzka-avi commented 5 years ago

@zimmerle (by the way: @mirkodziadzka-avi here, not @michaelgranzow-avi :-) Thanks for the explanation. That REQUEST_BODY is only set when needed is an implementation detail which does not impact the semantic. And it does make sense to avoid unneeded settings.

My question is about changing the semantic:

If you are looking for CoreRuleset rules, they often have an ARGS|REQUEST_BODY construct.

In this construct, the semantic idea is that you are either able to parse the request (then use the parsed part in ARGS) or you can not parse the request because of unknown content-type and then you inspect the whole REQUEST_BODY for attack patterns. This is more or less what the documented behaviour from version 2 describes and from my point of view, this is a valid and important use case.

So my question is:

How could I express this intension otherwise? I think, the options are:

we can duplicate all rules and decided depending on your body parser if you use the ARGS rules or theREQUEST_BODY rules. Thats a maintenance nightmare
we can continue to use ARGS|REQUEST_BODY and waste a lot of CPU by scanning it twice. That could be a performance problem.

Do you have another proposal? Do I miss something?

zimmerle commented 5 years ago

Hi @mirkodziadzka-avi,

What the manual suggests is still valid, as long as you use the variable, it will be filled - giving the business logic that exists today. The same logic persists. But, regardless of ctl:forceRequestBodyVariable, if no one is using the variable it won't be filled.

How to write the rules is a decision that you have to make based on context. As you said, the amount of work that you are willing to put on it should be consider as well. Depending on the inspection context, an operator execution at ARGS could produce a completely different result for the same lookup at REQUEST_BODY.

mirkodziadzka-avi commented 5 years ago

Hi @zimmerle ... So I'm official confused now.

If you say, what the manual suggest is still valid, does this means that we do not set REQUEST_BODY, if body processor is JSON, XML or MULTIPART? Because this is what I read from the manual, what the coreruleset tests assume and what is different in current 3.x implementation.

airween commented 5 years ago

(Note, that I wrote above, I also ran into this issue here.)

@zimmerle,

the documentation said:

If the WWWFormURLEncoded body processor is used
If no body processor is used and ctl:forceRequestBodyVariable is used in phase:1

In ModSecurity2, if the CT is application/json or application/xml (therefore it's not application/x-www-form-urlencoded, so the first condition isn't met) but the ctl:forceRequestBodyVariable is turned on in phase1 (the second is met), then the REQUEST_BODY doesn't exists.

It means that this is a bug in ModSecurity2?

mirkodziadzka-avi commented 5 years ago

@airween In the case of application/json a body processor is used. So the second condition isn't met - independent of ctl:forceRequestBodyVariable

airween commented 5 years ago

hi @mirkodziadzka-avi,

imagine that it works the other way around. :)

You're talking about CoreRuleSet, and it has a rule here:

https://github.com/SpiderLabs/owasp-modsecurity-crs/blob/6debcaf02e226508a6ba78051847ecf5a0ab76b6/rules/REQUEST-901-INITIALIZATION.conf#L290-L300

# Force body variable
SecRule REQBODY_PROCESSOR "!@rx (?:URLENCODED|MULTIPART|XML|JSON)" \
    "id:901340,\
    phase:1,\
    pass,\
    nolog,\
    noauditlog,\
    msg:'Enabling body inspection',\
    tag:'paranoia-level/1',\
    ctl:forceRequestBodyVariable=On,\
    ver:'OWASP_CRS/3.2.0'"

So, as you can see, the second condition is true. And therefore the engine uses the REQUEST-BODY as correctly.

But if this statement is true, then the modsec2 works as incorrectly, so there is an incompatibility problem between version 2 and 3.

Note, that after a quick check I can see, that there isn't any regression tests to check this behavior.

zimmerle commented 5 years ago

So, as you can see, the second condition is true. And therefore the engine uses the REQUEST-BODY as correctly. @airween we cannot assume that the CRS is workable.

zimmerle commented 5 years ago

Hi @zimmerle ... So I'm official confused now.

Me too. Let me run a couple of tests and get back to the issue.

A clear fact to me is that it should not be that difficult to understand. Let's see what we can improve in v3 to make it crystal clear, regardless of backward compatibility.

airween commented 5 years ago

Meantime I made a regression test:

https://gist.github.com/airween/2c05460fb80b723f3ca4c7235c7f6242

Please review - I think @mirkodziadzka-avi is right. In this file there are two tests. The first one turn on the forceRequestBodyVariable, and looks up the REQUEST_BODY - it matched.

But at the second test, I didn't turned on this ctl action, so I expected that the result will HTTP 200 - but not, REQUEST_BODY exists, and matched. Note, that the CT header is application/xml in both cases.

zimmerle commented 5 years ago

@airween I always mention to you the importance of having clear test cases without too much unnecessary noise. In this example you have a rule marked as chain, but without a chained rule:

https://gist.github.com/airween/2c05460fb80b723f3ca4c7235c7f6242#file-variable-request_body-xml-json-L92

That may or may not affect the behavior of the test that you want to demonstrate. Please remove all unnecessary metadata and actions to make it clear to read.

Regardless, by reading you guys conversation, It seems like we have behavior on v2, another one described in the manual and the third one in v3. Therefore, it is very hard to tell which is the correct (expected) one. But, like mentioned, I will look into it. Won't be now, but it is on my queue.

airween commented 5 years ago

@zimmerle, you're right, sorry - I just realized that there where more not need lines in the original test (for chain action), and removed them to make cleaner test :).

Here is a more simple: https://gist.github.com/airween/8deacdbe5fbea76642c61b2e3c710c1e

I think the bug also exists in v3, but please check the test.

muradmomani commented 4 years ago

@zimmerle @airween Hi, any update to this issue ?

martinhsv commented 2 years ago

Although the decision for the v3 functionality was taken before I joined the project, my leaning is to defend the current v3 behaviour as a superior alternative as compared with having it mimic how v2 works.

As mentioned earlier in the above discussion, the v2 behaviour is very confusing. Other variables populated directly from an HTTP Request are populated automatically. It is more than a little bit strange that one variable is not populated automatically that way. To make matters worse, that variable might be populated automatically if the body is parsed with exactly one of the four available body processors, but not others. If one were writing a WAF from scratch it seems unlikely that anyone would consider this the best possible implementation.

Because of those idiosyncrasies, if one does want to write a rule for v2 to rigorously examine REQUEST_BODY, it may be (depending on the overall rule set and how they are individually administered) awkward, more work, and more error prone to do so -- because you need to write a second rule to use the 'force' action to make REQUEST_BODY available to you. (This may be less of an issue for generic rules and more of an issue for rules that seek to mitigate individual CVEs.)

My current understanding is that this strange processing in <= v2 was adopted for performance reasons. And, given the v2 implementation's close ties within Apache, that important performance considerations might exist there is entirely believable. I have seen no such potential positive impacts in v3 sufficient to outweigh the negatives mentioned in the previous paragraphs.

I will touch on a specific use case that appears to be of interest in a subsequent comment.

martinhsv commented 2 years ago

Regarding this rule:

# Force body variable
SecRule REQBODY_PROCESSOR "!@rx (?:URLENCODED|MULTIPART|XML|JSON)" \
    "id:901340,\
    phase:1,\
    pass,\
    nolog,\
    noauditlog,\
    msg:'Enabling body inspection',\
    tag:'paranoia-level/1',\
    ctl:forceRequestBodyVariable=On,\
    ver:'OWASP_CRS/3.2.0'"

The intent here is presumably to have the request body content available in REQUEST_BODY if it is known that it will not be available in parsed form via other variables such as ARGS.

In such cases in v2, other rules that may use a construct like 'ARGS|REQUEST_BODY' will effectively only run against real content in one of those two.

This technique obviously won't work with current v3 functionality. But there is an alternative that would work just fine and would be more consistent with the rest of ModSecurity processing.

Instead of 'enabling' REQUEST_BODY via ctl:forceRequestBodyVariable when a 'false' condition exists, one can remove REQUEST_BODY (when the ‘true’ condition exists) from consideration from the relevant rules using either ctl:ruleRemoveTargetById or ctl:ruleRemoveTargetByTag. This would make the exclusions consistent with how other targets are handled. Moreover, this alternative is one that this would work with both v2 and v3 as they exist today.

owasp-modsecurity / ModSecurity

REQUEST_BODY is set when it should not according to documentation #2146