Invalid Content-Type in the Legitimate dataset

The following Content-Type header appear in the Legitimate dataset which are not valid and should be marked as Malicious.

text/plain;charset=utf8:
- The media type is "text/plain," which is valid for plain text data.
- The charset parameter should be "utf-8," not "utf8" (note the hyphen instead of no space).
application/x-www-form-urlencoded;charset=utf-8;:
- The media type is "application/x-www-form-urlencoded," which is valid for form data submissions.
- The charset parameter is correctly specified as "utf-8."
- There's a trailing semicolon after "utf-8" which is not valid. Semicolons can be used to inject malicious code into the header.
application/json; charset=utf8:
- The media type is "application/json," which is valid for JSON data.
- Similar to the first example, the charset parameter should be "utf-8," not "utf8."

Number of cases per source:

Count	Test Name	Lower
130	browsing_realtor	"text/plain;charset=utf8"
10	browsing_samsung	"application/x-www-form-urlencoded;charset=utf-8;"
85	browsing_tumblr	"application/json; charset=utf8"

openappsec / waf-comparison-project