Closed ghost closed 6 years ago
I created ja_trie by running this:
./extraction/full_preprocess.sh ${DATA_DIR} ja
After that, checked this:
language_path = "../data/ja_trie/" trie = marisa_trie.Trie().load( join(language_path, "trie.marisa") ) assert trie.get('アメリカ') is not None
and it works. But if it contains any alphabet character, can't get anything:
assert trie.get('CIA') is not None
AssertionErrorTraceback (most recent call last) <ipython-input-11-41516e200beb> in <module>() ----> 1 assert trie.get('CIA') is not None AssertionError:
Absolutely, jawiki contains 'CIA' as anchor text, but why this happen?
The key 'CIA' have to be lower characters, and Japanese multi-byte alphabets have to transform to ascii
trie.get('cia') # works trie.get('CIA') # not works trie.get('cia') #not works
I created ja_trie by running this:
After that, checked this:
and it works. But if it contains any alphabet character, can't get anything:
Absolutely, jawiki contains 'CIA' as anchor text, but why this happen?