Problem with Russian cyrillic characters and rule id 960024

owasp-modsecurity / ModSecurity

ModSecurity is an open source, cross platform web application firewall (WAF) engine for Apache, IIS and Nginx. It has a robust event-based programming language which provides protection from a range of attacks against web applications and allows for HTTP traffic monitoring, logging and real-time analysis.

https://www.modsecurity.org

Apache License 2.0

8.06k stars 1.58k forks source link

Problem with Russian cyrillic characters and rule id 960024 #708

Closed saymne closed 7 years ago

saymne commented 10 years ago

Hi,

Our environment: Windows 2008 Server R2 Standard; IIS 7.5; ModSecurity IIS 2.7.7 (64-bit installer).

We have Russan version of website with pages that contain Russian (Cyrillic) letters. We don't have problem with ModSecurity IIS 2.7.7 when these pages are opened. Problem exist with search query that exist on site. If we put more than two Russian (Cyrillic) letters as search query value than access is blocked by rule 960024 (same is happening if we put more than two non-ASCII characters - e.g š,č,ć,ž,đ). These is problem since search query is blocked with rule 960024 (Meta-Character Anomaly Detection Alert - Repetative Non-Word Characters) when searching the regular Russian terms/expressions.

URL encoding is working. I also checked that t:urlDecodeUni exist in rule 960024. So question is why more than two non-ASCII characters as argument value are always blocked with this rule.

Best regards, Sasa

shrv commented 9 years ago

Good afternoon.

Try to rule 960024 to replace the regular expression \W{4,} to [^A-Za-zА-Яа-яё0-9_]{4,}.

Just though pcre declared support for the localization of many languages, Russian letters are there apparently are not included.

On my web resources decided.

saymne commented 9 years ago

I will try following and send feedback: replace the regular expression \W{4,} to [^A-Za-zА-Яа-яё0-9_]{4,} in rule 960024.

Thanks for suggestion

saymne commented 9 years ago

Obviously \W{4,} is equal with [^A-Za-z0-9_]{4,} If I want to include Russian characters question is following: is [АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзийклмнопрстуфхцчшщъыьэюя] equal to [А-Яа-я] or to [А-ЯЁЙа-яёй]? One of these two options probably needs to be used:

Replace \W{4,} to [^A-Za-zА-Яа-я0-9_]{4,}
Replace \W{4,} to [^A-Za-zА-ЯЁЙа-яёй0-9_]{4,} Do you know which option is covering all Russian characters?

Useful link: http://stackoverflow.com/questions/14906232/regular-expressions-with-the-cyrillic-alphabet

saymne commented 9 years ago

Problem is deeper since regular expression in rule 960024 (and all other rules) matches only lower byte of hexadecimal value after url encode. Example: d1 8f is hexadecimal value of я; after url encode я is represented with %D1%8F. If we replaced \W{4,} with [^я]{4,} in rule 960024 than all Russian characters which lower byte of hexadecimal value starts with 8F will not be matched with this pattern. This means that not only repetitive я (more than 4 times) will be ignored but also other Russian characters which lower byte after url encoding is having value 8F. This is not good. Regular expression should match both bytes of hexadecimal value after url encoding. This is the main problem here - core rule set or waf engine need to be fixed in order to work correctly in these cases.

Useful links: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=1024

http://nature.berkeley.edu/~casterln/modsecurity/html-multipage/06-transformation-functions.html

urlEncode - encodes input using URL encoding.

urlDecodeUni - In addition to decoding %xx like urlDecode, urlDecodeUni also decodes %uXXXX encoding (only the lower byte will be used, the higher byte will be discarded).

shrv commented 9 years ago

good afternoon

Closer expression is [А-ЯЁЙа-яёй] Although, I think that you are right and more reliable to use the full alphabet [АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзийклмнопрстуфхцчшщъыьэюя] It is strange that you did not help. I have a number of false positives decreased somewhere 4.

What is left is not terrible (I use mode Anomaly Scoring Detection Mode).

With regard to your second message. I already wrote about this, too, though not so clever words :) until the response of some clear not received.

https://github.com/SpiderLabs/owasp-modsecurity-crs/issues/220

saymne commented 9 years ago

Basically I think main issue with false positive related to non-English characters are caused with utlDecodeUni transformation function which discards higher byte of url encoding

Right solution will be to fix this bug and keep current syntax of relevant rules. But as I saw from your and other similar open issues solution will be to modify rules in CRS 3.0. They will exclude these characters from matching.

I tried last version modsecurity 2.9.0 for iis with latest crs 2.2.9-9 and problem still exist with mentioned characters from other languages

shrv commented 9 years ago

So still wait for a new set of rules in the kernel release and under these rules. Expand on the test machine, check for bugs and hope that the number of false positives is reduced.

I tried at a few rules of version 3.0. I have a kernel version 2.7.7 under Apache. Such a rule 981 318, see false positives

saymne commented 9 years ago

I just check syntax of rule 960024 in crs 3.0.0-dev and it is the same as in crs 2.2.9.

csanders-git commented 9 years ago

hey @saymne can you describe the detail with utldecodeuni a little bit more... I apologize as an English speaker i am just dense to this problem :-P. @zimmerle what are your thoughts here?

csanders-git commented 9 years ago

A very similar problem...

Hi,

Our environment: Windows 2012 Server R2 Standard; Apache/2.4.12 (Win64) OpenSSL/1.0.1m PHP/5.6.9; ModSecurity for Apache/2.8.0; OWASP_CRS/2.2.9

We have Arabic version of website with pages that contains Arabic letters. We don't have any problem with ModSecurity when Post any English letters. Problem exist when Anyone POST comments with Arabic letters.

This Message form Apache Log :

Message: Access denied with code 403 (phase 2). Pattern match "\W{4,}" at ARGS:comment. [file "C:/Apache24/conf/crs/activated_rules/modsecurity_crs_40_generic_attacks.conf"] [line "37"] [id "960024"] [rev "2"] [msg "Meta-Character Anomaly Detection Alert - Repetative Non-Word Characters"] [data "Matched Data: \xd8\xb5\xd8\xa8\xd8\xa7\xd8\xad \xd8\xa7\xd9\x84\xd8\xae\xd9\x8a\xd8\xb1 found within ARGS:comment: \xd8\xb5\xd8\xa8\xd8\xa7\xd8\xad \xd8\xa7\xd9\x84\xd8\xae\xd9\x8a\xd8\xb1"] [ver "OWASP_CRS/2.2.9"] [maturity "9"] [accuracy "8"] We Add these in httpd.conf m but still the same issue !!

SecUnicodeMapFile crs\unicode.mapping SecUnicodeCodePage 1256 Best regards, shadi

csanders-git commented 9 years ago

It appears that this is a problem with how PCRE handles foreign chars... So our regex's need to take account of this in CRS.

odesk2dot2by commented 9 years ago

I am supporting site, where exist sub-sites in subdomains on 22 languages. For all languages except english exists problems with rule ID 960024. In my case I am making update of target ( exclude variables from analyse). It is more simply. Sadly, but cons of defence. I think, you need disable 'original' rule (english style) for your URI with problems, to make copy with new ID and to add changes. IMHO. It is more correct way. As you (saymne) said, it is more deep problem of stack/source of code.

best regards, Andrei upwork.link

victorhora commented 7 years ago

Closing this one as it's more related with the CRS rules and it's already being handled at OWASP ModSecurity CRS #220 and OWASP ModSecurity CRS #226.