Closed HatScripts closed 1 month ago
const input = `
Assamese -> Assam`
Additionally, removing the quotes and leaving just the whitespace (4 spaces on each line) still results in the unexpected censoring. If you delete any of these spaces, no censoring occurs.
The error seems to be in the whitelisted term matching logic. In particular, we are using an index into the original input where we should instead be using an index to the transformed input, resulting in the second assa
to be skipped over*. The following diff seems to fix it, if this is urgent for you:
diff --git a/src/matcher/regexp/RegExpMatcher.ts b/src/matcher/regexp/RegExpMatcher.ts
index 7f4fdb1..af31d87 100644
--- a/src/matcher/regexp/RegExpMatcher.ts
+++ b/src/matcher/regexp/RegExpMatcher.ts
@@ -161,7 +161,7 @@ export class RegExpMatcher implements Matcher {
}
matches.insert(indices[startIndex], endIndex);
- lastEnd = endIndex + 1;
+ lastEnd = startIndex + whitelistedTerm.length;
}
}
I will hold off on a patch release until I have time to look at this more carefully, though. The matching logic is fairly complex and I would like to refamiliarize myself with the implementation to ensure this is fully correct first (particularly in cases with non-ASCII characters.) Unfortunately, as I said in #46, this may have to wait until late this month or early February. Apologies.
*I verified that there is no security issue with OOB access due to this mismatch--it should be purely a matter of correctness.
Expected behavior
When I input the following string:
I expect that there should be no censoring.
Actual behavior
However,
Assam
becomesA*sam
.Strangely, modifying parts of the string, such as the quotes (
"
), results in no censoring.Minimal reproducible example
Steps to reproduce
Additional context
No response
Node.js version
N/A
Obscenity version
0.2.0
Priority
Terms