Open fabiomb opened 2 days ago
The "SEO spam" rule (id 13
) looks for a few suspicious CSS rules and also looks at post content or phrases and keywords that are commonly found in content that is SEO spam.
The "JavaScript/charcode checks" rule (id 12
) looks for embedded JavaScript or calls to fromcharcode
in post content. Such content is usually malicious.
I will mention these rules to our team and see if they should be adjusted. Would you be willing to share any examples of matching posts that you believe to be false positives?
If rules are consistently returning false positives, you can also exclude them from the scan using the -e
/ --exclude-rules
option. For example, to exclude the two rules you mentioned from the scan and just use the remaining rules, you can add -e 12 -e 13
to the db-scan
command. I don't generally recommend excluding rules as it can lead to missing actual results, but it is an option.
thanks @akenion the two false positives are with SEO plugins (Rank Math SEO in my case) and the javascript one reacts when you have some JS like the Twitter embed (old method) or YouTube (old method), so yes, it's clear that could be the case
That makes sense regarding the Twitter and YouTube embeddings. In your case, I'd advise simply excluding rule 12 ("JavaScript/charcode checks") for the time being or finding an alternate way to embed that content.
Can you provide a specific match for the Rank Math SEO case (an example output row from running db-scan
)? Reviewing the rule, I'm not immediately following how that plugin would generate a false positive.
results.csv here's the results.csv with all the scan export, there's a lot of positives in really old content, but in the newer it's strange to find it
Thanks for sharing the results. It does look like you have legitimate content that includes keywords that are often found in SEO spam and hence the signature matches. Our team is reviewing this internally to see if any improvements can be made around this, but for now, I do recommend excluding the two problematic rules (using -e 12 -e 13
as mentioned earlier) since they are generating a high number of false positives on your site. I will post here if that recommendation changes after further review and discussion.
I don't understand the criteria of the rule, but I tried the new cli scan for database and a lot of my articles gives "Suspicious database record found in table "wp_posts" matching rule "SEO spam":..." But there's no problem at the articles cited, just a suspicious wtf, too many links? I don't understand.
then i got a lot of "JavaScript/charcode checks" when I have some twitter posts in the post_content 🤷
some ideas? I got more than 500+ results and can't find a single real problem