Open airween opened 2 weeks ago
does ModSecurity use the libxml2 SAX parser? When I read SecRequestBodyXMLLimit
it sounds to me like it's not. Presuming that ModSecurity could collect all elements that rules might want to inspect (which would rule out the use of sub-tree wild cards), ModSecurity could simply keep track of those elements and discard everything else, only a single pass would be needed and no extra memory than what is needed to store the elements that might be inspected.
One very frustrating thing at the moment is that JSON targets can't be matched using wild cards. For example, if I wanted to match / exclude attribute
from all nodes, I couldn't write json.level1.*.attribute
but would have to to list every combination. Listing every possible combination may actually not be possible at all. This is not only a convenience issue but can also make it easy to bypass rules.
while the argument names in libmodsecurity3 are much better than in v2, the numbering of list elements is a potential problem for when you want to match any element in a potentially large list. A rule would need exclude the complete array or enumerate every entry in the list, which isn't possible. This will always give attackers a way to bypass such a rule by simply placing the payload at the first index that isn't being inspected.
I suggest focusing on one feature at a time. For me, that would currently be giving the user the full ability to inspect XML contents and nodes. Other features can be added later on. IMO, one really good feature is often much more valuable than many of mediocre quality.
mod_security2 uses https://gitlab.gnome.org/GNOME/libxml2/
On the long term, it looks logical to have the same behaviour in v2 & v3, and also with XML & JSON. We have to manage that path (compatibility), and also to choose the best features. One question that need to be raised: do we consider XML as a priority? Some people still need it, obviously, but how many? Does it worth the effort on the long term? Ther's no judgement here, just an open question.
Remark about v2 JSON parsing: you also have arrays appearing in the ARGS names, but without index, like "json.b.array.a1". My 2 cents: from a security point of view, it's better than having an index as the order of the arrays makes no sense for a parser (and I don't think you can guarantee the order).
About limiting the parsing: This could be a gain in case of huge payload. We may want to parse only some items or exclude some.
About Wildcards in JSON: regex in targets is definitely a must
Another feature asked by many people is the possibility to parse a JSON from a custom variable (like the value of a cookie, maybe after base64-decode - yes, everybody thinks about JWT). This feature should probably be discussed from scratch when designing a new parsing (but, as @theseion said, features should be implemented one at a time).
About Wildcards in JSON: regex in targets is definitely a must Btw, it's possible to create a rule to match the targets and add them one by one. It's far from optimal, but it works.
XML parser
does ModSecurity use the libxml2 SAX parser?
yes. Both version uses libxml2.
When I read
SecRequestBodyXMLLimit
it sounds to me like it's not.
Sorry, but I don't understand this. Where is the SecRequestBodyXMLLimit
keyword? I can't find it either in the documentation or in my initial comment.
Presuming that ModSecurity could collect all elements that rules might want to inspect
yes - the question is here that what ModSecurity might want to inspect? Eg. is it necessary the tags' arguments?
Wild cards
One very frustrating thing at the moment is that JSON targets can't be matched using wild cards. For example, if I wanted to match / exclude
attribute
from all nodes, I couldn't writejson.level1.*.attribute
Why? This works for me:
SecRule ARGS:/level1\.*/ "@rx foobar" \
"id:1007,\
phase:2,\
pass,\
log,\
msg:'Phase 1: Parameters: %{MATCHED_VAR_NAME} %{MATCHED_VAR}'"
the request:
curl -H "Content-Type: application/json" -d '{"level1":{"key1":{"arg":"foobar"},"key2":{"arg":"foobar"}},"level2":{"key1":{"arg":"foobar"},"key2":{"arg":"foobar"}}}' http://localhost/post
then I see in the log:
ModSecurity: Warning. Pattern match "foobar" at ARGS:level1.key1.arg. [file "/home/airween/src/coreruleset/rules/REQUEST-900-EXCLUSION-RULES-BEFORE-CRS.conf"] [line "236"] [id "1007"] [msg "Phase 1: Parameters: ARGS:level1.key1.arg foobar"] [hostname "localhost"] [uri "/post"] [unique_id "ZoLG_jDbBliqNtRf162A5QAAAAA"]
ModSecurity: Warning. Pattern match "foobar" at ARGS:level1.key2.arg. [file "/home/airween/src/coreruleset/rules/REQUEST-900-EXCLUSION-RULES-BEFORE-CRS.conf"] [line "236"] [id "1007"] [msg "Phase 1: Parameters: ARGS:level1.key2.arg foobar"] [hostname "localhost"] [uri "/post"] [unique_id "ZoLG_jDbBliqNtRf162A5QAAAAA"]
but don't see the args with name level2....
.
May be this is something you need.
JSON
while the argument names in libmodsecurity3 are much better than in v2, the numbering of list elements is a potential problem for when you want to match any element in a potentially large list. A rule would need exclude the complete array or enumerate every entry in the list, which isn't possible. This will always give attackers a way to bypass such a rule by simply placing the payload at the first index that isn't being inspected.
Ahm, I think I see what you mean. Needs to figure out the correct handling.
General
I suggest focusing on one feature at a time. For me, that would currently be giving the user the full ability to inspect XML contents and nodes.
In case of content do you think about the whole XML raw content?
Now the demand is that we need the XML's key:value pairs, especially generate exclusions. (Regarding to nodes.)
Other features can be added later on. IMO, one really good feature is often much more valuable than many of mediocre quality.
Sorry, what "other features" do you think about?
One question that need to be raised: do we consider XML as a priority? Some people still need it, obviously, but how many? Does it worth the effort on the long term? Ther's no judgement here, just an open question.
I don't know how many users expects it - a feature request received in private, which is quite urgent, but they use the mainline mod_security2, so we can't provide it as a "custom feature", because later that would lead collisions.
But yes, your question is legal - this is not the most priority, but regarding the circumstances, we can't avoid that.
Sorry, but I don't understand this. Where is the SecRequestBodyXMLLimit keyword? I can't find it either in the documentation or in my initial comment.
I meant to write SecRequestBodyXMLDepthLimit
Wild cards One very frustrating thing at the moment is that JSON targets can't be matched using wild cards. For example, if I wanted to match / exclude attribute from all nodes, I couldn't write json.level1.*.attribute
Why? This works for me:
It works with @rx
but not with ctl:removeTargetById
, AFAIK.
Remark about v2 JSON parsing: you also have arrays appearing in the ARGS names, but without index, like "json.b.array.a1". My 2 cents: from a security point of view, it's better than having an index as the order of the arrays makes no sense for a parser (and I don't think you can guarantee the order).
Okay, pro: security reason, con: can't reference to an exact item.
This is why I started this discussion to collect pros and cons for functionalities.
Wild cards One very frustrating thing at the moment is that JSON targets can't be matched using wild cards. For example, if I wanted to match / exclude attribute from all nodes, I couldn't write json.level1.*.attribute Why? This works for me:
It works with
@rx
but not withctl:removeTargetById
, AFAIK.
Ah, I see - thanks.
In case of content do you think about the whole XML raw content?
Not raw necessarily. I just think that being able to inspect values and nodes properly has to be the most important thing. Raw sounds expensive, maybe that's not necessary.
Sorry, what "other features" do you think about?
You wrote additional validation, for example.
I meant to write
SecRequestBodyXMLDepthLimit
Thanks. But why do you think that reading this keyword it sounds to you like ModSecurity does not use libXML?
SAX, not libxml.
libxml2 has a SAX parser, so XML can be streamed. For streams, a depth limit doesn't make much sense (I think).
In case of content do you think about the whole XML raw content?
Not raw necessarily. I just think that being able to inspect values and nodes properly has to be the most important thing. Raw sounds expensive, maybe that's not necessary.
Raw appears in REQUEST_BODY
(by a bug in libmodsecurity3, and perhaps you can force it with ctl:forceRequestBodyVariable
in v2).
But for your idea: could you write an example?
Another feature asked by many people is the possibility to parse a JSON from a custom variable (like the value of a cookie, maybe after base64-decode - yes, everybody thinks about JWT). This feature should probably be discussed from scratch when designing a new parsing (but, as @theseion said, features should be implemented one at a time).
This seems to me like a new transformation - may be we should open a new discussion for that.
Tree based XML parser:
tree := parse(xmlString)
for (node in nodeWalk(tree)) {
// eval rule
}
Stream based XML parser (SAX):
stream := parseStreaming(xmlString)
while (!stream.atEnd()) {
node := stream.next()
if (anyRuleNeedsNode(node) {
cachedNodes.add(node)
}
}
...
// eval rules using cachedNodes
Access any node below a given level with a certain name:
ctl:removeTargetById=999999;ARGS:xml.level1.*.myNode
Access any attribute below a given level with a certain name:
ctl:removeTargetById=999999;ARGS:xmlAttributes.level1.*.myAttribute
Access to prelude:
SecRule ARGS:XML_PRELUDE "@rx DOCTYPE"
Access to trailer:
SecRule ARGS:XML_TRAILER "@rx NOT PART OF MY XML.*"
In short, raw access is probably a good idea, but there should be other options, as applying a regular expression to a huge string is never a good idea. Usually, the scope can be reduced significantly, provided that it's possible to access everything.
Remark about v2 JSON parsing: you also have arrays appearing in the ARGS names, but without index, like "json.b.array.a1". My 2 cents: from a security point of view, it's better than having an index as the order of the arrays makes no sense for a parser (and I don't think you can guarantee the order).
Okay, pro: security reason, con: can't reference to an exact item.
This is why I started this discussion to collect pros and cons for functionalities.
Why would you want check the second item only, knowing that I could evade your check by switching both items?
Why would you want check the second item only, knowing that I could evade your check by switching both items?
I don't want to check only the second item - I would expect that the engine check all items in the list.
Tree based XML parser:
... Stream based XML parser (SAX):
These are called in libxml2 DOM parser and SAX parser.
May be we should inspect the efficiency and performance of both parsers.
Examples of what access could look like
Access any node below a given level with a certain name:
ctl:removeTargetById=999999;ARGS:xml.level1.*.myNode
Access any attribute below a given level with a certain name:
ctl:removeTargetById=999999;ARGS:xmlAttributes.level1.*.myAttribute
Ah, I see, so with these prefixing we can divide the the nodes' key:pair list but also we can access to attributes. Thanks for the idea, we should take a look.
Access to prelude:
SecRule ARGS:XML_PRELUDE "@rx DOCTYPE"
Access to trailer:
SecRule ARGS:XML_TRAILER "@rx NOT PART OF MY XML.*"
In short, raw access is probably a good idea, but there should be other options, as applying a regular expression to a huge string is never a good idea. Usually, the scope can be reduced significantly, provided that it's possible to access everything.
Right. But first, as you wrote, we should start with one function, then we can continue with the others.
I made a small example which could be the base of future XML parser. The parser uses libxml2's SAX parser, the newest version (v2 - the old methods and structures are deprecated).
Please take a look at that: https://github.com/airween/saxparser_example
You should try that with other XML's, and make investigations with Valgrind or another memory testers.
Any feedback is welcome.
Describe the bug
It's not a bug but a discussion about a new feature, how can we extend the XML processing.
There is a feature request from a customer that we should extend the engines' XML parsing capability. Of course, we should add this request to both engine with same behavior.
Current behavior
Consider this payload:
This payload will appear in current state in the engines:
(mod_security2)
(libmodsecurity3)
(lines from debug.logs)
Problem
The problem is that exclusions for sub-parts and specific nodes does not work. See the example:
because the XML variable holds the concatenated node values, not a key:value pairs like JSON. Therefore it's impossible to create any exclusion against any rules.
Possible solution
Consider this converted strcture (XML to JSON):
This payload will expanded like this:
(mod_security2)
(libmodsecurity3)
The idea is to transform the XML structure in a similar way.
Example:
(libmodsecurity3)
Possible risks
How can we avoid/handle the risks?
We can put the decision in the hands of the user, whether he wants to see the new collection under the
ARGS
or not - so introduce a new configuration keyword, eg.SecParseXMLintoArgs
(consider the optional runtime config, eg.ctl:parseXMLintoArgs
)As in case of JSON, introduce a new configuration keyword which controls the maximum number of XML levels that can be analyzed, eg.
SecRequestBodyXMLDepthLimit
(see SecRequestBodyJSONDepthLimit)More todo's
We have to:
For the last item: the behavior of JSON parsing in two versions are different. Consider the payload
{"a":1,"b":[{"a1":"a1val"},{"a1":"a2val"}]}
(see that there is a list!) which is equivalent with this XML:which produces these results:
(mod_security2)
(libmodsecurity3)
Note, that please check the list items with the same keys! I think we should follow the libmodsecurity3's behavior - but the the XML and JSON won't be compatible. (Which implies the next question: do we want to align the mod_security2's behavior?)
Any feedback are welcome!