openappsec / waf-comparison-project

Testing datasets and tools to compare WAF efficacy
https://www.openappsec.io
Apache License 2.0
144 stars 24 forks source link

Invalid Content-Type in the Legitimate dataset #3

Closed udi-aharon closed 1 year ago

udi-aharon commented 1 year ago

The following Content-Type header appear in the Legitimate dataset which are not valid and should be marked as Malicious.

  1. text/plain;charset=utf8:

    • The media type is "text/plain," which is valid for plain text data.
    • The charset parameter should be "utf-8," not "utf8" (note the hyphen instead of no space).
  2. application/x-www-form-urlencoded;charset=utf-8;:

    • The media type is "application/x-www-form-urlencoded," which is valid for form data submissions.
    • The charset parameter is correctly specified as "utf-8."
    • There's a trailing semicolon after "utf-8" which is not valid. Semicolons can be used to inject malicious code into the header.
  3. application/json; charset=utf8:

    • The media type is "application/json," which is valid for JSON data.
    • Similar to the first example, the charset parameter should be "utf-8," not "utf8."

Number of cases per source:

Count Test Name Lower
130 browsing_realtor "text/plain;charset=utf8"
10 browsing_samsung "application/x-www-form-urlencoded;charset=utf-8;"
85 browsing_tumblr "application/json; charset=utf8"
Boris-Rozenfeld commented 1 year ago

Thank you @udi-aharon for the note. The traffic used in the legit data-set is based on real-world traffic that surprisingly included the 'utf8' string. We do acknowledge that it should be 'utf-8' (and apparently also the app developers changed it. One explanation may be usage of certin dotnet libraries that were updated). In any case, we will manually change it next time we update the dataset with real-world traffic. Thanks again.