houbb / sensitive-word

👮‍♂️The sensitive word tool for java.(敏感词/违禁词/违法词/脏词。基于 DFA 算法实现的高性能 java 敏感词过滤工具框架。请勿发布涉及政治、广告、营销、翻墙、违反国家法律法规等内容。高性能敏感词检测过滤组件,附带繁体简体互换,支持全角半角互换,汉字转拼音,模糊搜索等功能。)
https://houbb.github.io/opensource/sensitive-word
Apache License 2.0
4.1k stars 545 forks source link

提个遇到的问题 #40

Closed hq112415 closed 4 months ago

hq112415 commented 9 months ago

代码为: final List words = Lists.newArrayList("fuck", "cao", "shit", "shift");

    SensitiveWordBs holder = SensitiveWordBs.newInstance()
            .enableNumCheck(false)
            .wordDeny(() -> words)
            .init();

    String word = "*OW,awv 10720hj\uD83D\uDC42YRLH\uD83D\uDE4Cl.L #N, E in cR,aQlgQ. 9k.a+47 gfj172.J59 ptjaUSERa ]#075'8\uD83D\uDE46asYKO.QRW.XVMYV 713e5*foovco6tm9# kP G.SLKPR 'GA14998b\\nGjiw a7jz l,aaeeg\\nphd  jgt y\\n@zamnko ";
    System.out.println(holder.findAll(word));

命中: [asYKO.QRW.XVMYV, G.SLKPR]

houbb commented 9 months ago

应该是命中了网址之类的,可以本地 debug 试一下。网址的检测可以关闭