houbb / sensitive-word

👮‍♂️The sensitive word tool for java.(敏感词/违禁词/违法词/脏词。基于 DFA 算法实现的高性能 java 敏感词过滤工具框架。请勿发布涉及政治、广告、营销、翻墙、违反国家法律法规等内容。高性能敏感词检测过滤组件,附带繁体简体互换,支持全角半角互换,汉字转拼音,模糊搜索等功能。)
https://houbb.github.io/opensource/sensitive-word
Apache License 2.0
4.1k stars 545 forks source link

单词中的部分字符会被替换 #45

Closed JoshuaLiew closed 6 months ago

JoshuaLiew commented 6 months ago

对于英文单词Disburse之类的,其中的sb字母会被替换,要怎么处理,能不能只有整个单词匹配的时候才替换

houbb commented 6 months ago

v0.13.0 已经支持。参见 wordResultCondition-针对匹配词进一步判断

如下:

final String text = "I have a nice day。";

List<String> wordList = SensitiveWordBs.newInstance()
        .wordDeny(new IWordDeny() {
            @Override
            public List<String> deny() {
                return Collections.singletonList("av");
            }
        })
        .wordResultCondition(WordResultConditions.englishWordMatch())
        .init()
        .findAll(text);
Assert.assertEquals("[]", wordList.toString());