mayabot / mynlp

一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典)
https://mynlp.mayabot.com/
Apache License 2.0
675 stars 90 forks source link

如何添加自己的词典? #27

Open zz1559152814 opened 4 years ago

jimichan commented 4 years ago
        MemCustomDictionary memCustomDictionary = new MemCustomDictionary();

        FluentLexerBuilder builder = Lexers.coreBuilder();

        builder.with(new CustomDictionaryPlugin(memCustomDictionary));

        Lexer tokenizer = builder.build();

        System.out.println(tokenizer);

        System.out.println(tokenizer.scan("欢迎来到松江临港科技城"));

        memCustomDictionary.addWord("临港科技城");
        memCustomDictionary.rebuild();

        System.out.println(tokenizer.scan("欢迎来到松江临港科技城"));

Output:

PipelineTokenizer

BestPathAlgorithm = ViterbiBestPathAlgorithm
CharNormalize = DefaultCharNormalize
WordTermCollector = SentenceCollector
WordSplitAlgorithm = CoreDictionarySplitAlgorithm,AtomSplitAlgorithm
WordpathProcessor = 
    CustomDictionaryProcessor

欢迎 来到 松江 临港 科技城
欢迎 来到 松江 临港科技城