nakagami / Awabi.jl

A morphological analyzer using mecab dictionary
MIT License
8 stars 2 forks source link

userdic do not work #3

Closed ujimushi closed 2 years ago

ujimushi commented 2 years ago

I tried next code.

using Awabi

# テスト用ユーザー辞書作成準備
test_dir = joinpath(ENV["HOME"], "test_awabi");
userdic_src = joinpath(test_dir, "user_dic.csv");
userdic = joinpath(test_dir, "user.dic");
dicdir = "/var/lib/mecab/dic/debian";
dic_src = "/usr/share/mecab/dic/ipadic";
cmd_idx = "/usr/lib/mecab/mecab-dict-index";
if !isdir(test_dir)
    mkdir(test_dir);
end
open(userdic_src, "w") do io
    write(io,
          ("ユーザー辞書,,,6058,名詞,一般,*,*,*,*," *
              "ユーザー辞書,ユーザージショ,ユーザージショ\n"));
end

# テスト用ユーザー辞書作成
run(`$cmd_idx -d $dic_src -u $userdic -f utf-8 -t utf-8 $userdic_src`);

# テスト
tk = Awabi.Tokenizer(Dict("dicdir" => dicdir, "userdic" => userdic));
tokenize(tk, "ユーザー辞書が動いてないかもしれない")

Next is result.

julia> include("/home/ujimushi/test/awabi_user_dic_test.jl")
reading /home/ujimushi/test_awabi/user_dic.csv ... 1
emitting double-array: 100% |###########################################| 

done!
ERROR: LoadError: UndefVarError: user_dic not defined
Stacktrace:
 [1] build_lattice(tokenizer::Tokenizer, sentence::String)
   @ Awabi ~/.julia/packages/Awabi/NqVsn/src/tokenizer.jl:71
 [2] tokenize(tokenizer::Tokenizer, s::String)
   @ Awabi ~/.julia/packages/Awabi/NqVsn/src/tokenizer.jl:110
 [3] top-level scope
   @ ~/test/awabi_user_dic_test.jl:24
 [4] include(fname::String)
   @ Base.MainInclude ./client.jl:476
 [5] top-level scope
   @ REPL[2]:1
in expression starting at /home/ujimushi/test/awabi_user_dic_test.jl:24

Maybe, user_dic is not defined.

https://github.com/nakagami/Awabi.jl/blob/926bee9a875dc098c091fec23364feed98ab6766/src/tokenizer.jl#L71

In package mode, dev Awabi and next change done.

diff --git a/src/tokenizer.jl b/src/tokenizer.jl
index a1d59c7..cb198cc 100644
--- a/src/tokenizer.jl
+++ b/src/tokenizer.jl
@@ -68,7 +68,7 @@ function build_lattice(tokenizer::Tokenizer, sentence::String)::Lattice

         # user_dic
         if tokenizer.user_dic != nothing
-            user_entries = lookup(user_dic, s[(pos+1):length(s)])
+            user_entries = lookup(tokenizer.user_dic, s[(pos+1):length(s)])
             if length(user_entries) > 0
                 for e in user_entries
                     add!(lattice, new_node(e), tokenizer.matrix)

I stopped julia, and I tried, again.

julia> include("/home/ujimushi/test/awabi_user_dic_test.jl")
[ Info: Precompiling Awabi [b89ecf66-93e0-42cf-a85d-3fd691c1774b]
reading /home/ujimushi/test_awabi/user_dic.csv ... 1
emitting double-array: 100% |###########################################| 

done!
8-element Vector{Tuple{String, String}}:
 ("ユーザー辞書", "名詞,一般,*,*,*,*,ユーザー辞書,ユーザージショ,ユーザージショ")
 ("が", "助詞,格助詞,一般,*,*,*,が,ガ,ガ")
 ("動い", "動詞,自立,*,*,五段・カ行イ音便,連用タ接続,動く,ウゴイ,ウゴイ")
 ("て", "動詞,非自立,*,*,一段,未然形,てる,テ,テ")
 ("ない", "助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ")
 ("かも", "助詞,副助詞,*,*,*,*,かも,カモ,カモ")
 ("しれ", "動詞,自立,*,*,一段,未然形,しれる,シレ,シレ")
 ("ない", "助動詞,*,*,*,特殊・ナイ,基本形,ない,ナイ,ナイ")

It's Ok, maybe.

nakagami commented 2 years ago

Thanks!