Open coleea opened 7 years ago
I have exactly the same problem, when I tried to increase the size of the dictionary. A reply would be appreciated.
I have exactly the same problem. It runs when I separate the dictionary but it fails when I try to apply it to the original file.
Hi. When I run 'mecab-dict-index', error occured. log information is like this.
==============================================================================
reading ./ETN.csv ... 14 reading ./LISTEN_NER.csv ... 2081 reading ./Preanalysis.csv ... 5 reading ./TV_fullKorean_dict.csv ... 1687814 reading ./NP.csv ... 342 reading ./EF.csv ... 1820 reading ./XSA.csv ... 20 reading ./MM.csv ... 453 reading ./keyword.csv ... 276 reading ./XPN.csv ... 83 reading ./unk_word 1 1 0 (2nd).csv ... 276 reading ./Inflect.csv ... 44850 reading ./VA.csv ... 2360 reading ./XSV.csv ... 24 reading ./keyword_etc.csv ... 222 reading ./Place.csv ... 30300 reading ./LISTEN_unk_word 1 1 9.csv ... 254 reading ./LISTEN_KEYWORD.csv ... 2 reading ./sejong21_word.csv ... 846637 reading ./NNP.csv ... 2371 reading ./Hanja.csv ... 124570 reading ./EP.csv ... 51 reading ./KOR_ENG_csv.csv ... 60365 reading ./sejong21_verbal2.csv ... 15160 reading ./Foreign.csv ... 11599 reading ./NR.csv ... 482 reading ./NNB.csv ... 140 reading ./LISTEN_unk_word.csv ... 254 reading ./Wikipedia.csv ... 36763 reading ./sejong21_fusion.csv ... 1321382 reading ./VCN.csv ... 7 reading ./NNG.csv ... 205269 reading ./MAG.csv ... 14244 reading ./Person-actor.csv ... 99237 reading ./Symbol.csv ... 16 reading ./VCP.csv ... 9 reading ./VX.csv ... 125 reading ./Person.csv ... 196461 reading ./Group.csv ... 3176 reading ./XSN.csv ... 124 reading ./ETM.csv ... 133 reading ./NorthKorea.csv ... 3 dictionary.cpp(472) [da.build(str.size(), const_cast<char **>(&str[0]), &len[0], &val[0], &progress_bar_darts) == 0] unkown error in building double-array
==============================================================================
[dictionary.cpp] line 472~476 is like this
for (size_t i = 0; i < dic.size(); ++i) { | tbuf.append(reinterpret_cast<const char*>(dic[i].second), | sizeof(Token)); | delete dic[i].second; | }
==============================================================================
this error occured when I add 'TV_fullKorean_dict.csv' that contains 1,687,814 entry data. file size is 165.8MB. Is there any limit of csv file size ?
Thank you