rime / librime

Rime Input Method Engine, the core library
https://rime.im
BSD 3-Clause "New" or "Revised" License
3.42k stars 559 forks source link

Error when reading `build/...table.bin` #323

Open sih4sing5hong5 opened 5 years ago

sih4sing5hong5 commented 5 years ago

We got error when loading table.bin

E1104 07:04:07.984189     7 table.cc:316] invalid metadata.
E1104 07:04:07.984838     7 deployment_tasks.cc:373] dictionary 'taigi_pojhanlo' failed to compile.
ready.
E1104 07:04:08.002444     6 table.cc:316] invalid metadata.
E1104 07:04:08.002562     6 dictionary.cc:263] Error loading table for dictionary 'taigi_pojhanlo'.
E1104 07:04:08.024273     6 table.cc:316] invalid metadata.

Failed yaml

taigi_pojhanlo.dict.yaml

# Rime dictionary
# encoding: utf-8

---
name: taigi_pojhanlo
version: "1.0.0"
sort: by_weight
use_preset_vocabulary: false
import_tables:
  - taigi_pojhanlo.extended
...
Peh-ōe-jī   Peh8 oe7 ji7

The result of taigi_pojhanlo.table.bin

$ strings taigi_pojhanlo.table.bin 
We love Marisa.
Rime::Table/4.0
$ hexdump taigi_pojhanlo.table.bin 
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0000020 b60b c503 0002 0000 0001 0000 0018 0000
0000030 0020 0000 0000 0000 0000 0000 0000 0000
0000040 0000 0000 0002 0000 0001 0000 0000 0000
0000050 0002 0000 0000 0000 0014 0000 0010 0000
0000060 0000 0000 0000 0000 0000 0000 0001 0000
...
0001000 6952 656d 3a3a 6154 6c62 2f65 2e34 0030  # Rime::Table/4.0
0001010 0000 0000 0000 0000 0000 0000 0000 0000
0001020 ffff ffff 0000 0080 ffff ffff ffff ffff
0001030 0000 0080 ffff ffff ffff ffff f04c ffff
0001040 1008 0000 ffff ffff 0000 0080 ffff ffff

Passed yaml

taigi_pojhanlo.dict.yaml

# Rime dictionary
# encoding: utf-8

---
name: taigi_pojhanlo
version: "1.0.0"
sort: by_weight
use_preset_vocabulary: false
import_tables:
  - taigi_pojhanlo.extended
...
Peh Peh8 oe7 ji7

The result of taigi_pojhanlo.table.bin

$ strings taigi_pojhanlo.table.bin 
Rime::Table/4.0
We love Marisa.
$ hexdump taigi_pojhanlo.table.bin 
0000000 6952 656d 3a3a 6154 6c62 2f65 2e34 0030 # Rime::Table/4.0
0000010 0000 0000 0000 0000 0000 0000 0000 0000
0000020 bde6 dcd9 0002 0000 0001 0000 0018 0000
0000030 0020 0000 0000 0000 0000 0000 004c 0000

taigi_pojhanlo.schema.yaml

# Rime schema
# encoding: utf-8
#
# 台語POJ漢羅輸入法
#

schema: # 輸入法方案
  schema_id: taigi_pojhanlo # 輸入法ID
  name: 台語POJ漢羅輸入法  # 輸入法名稱
  version: "1.0.0"  # 版本號碼
  description: 台語POJ漢羅輸入法
  author:
    - Ngô͘ Hê-bí <ngoohebi@gmail.com>

switches:
  - name: ascii_mode
    states: ["台文", "英文"]
    reset: 0   # 預設0是台文,1是英文
  - name: full_shape
    states: ["半形羅馬字","全形羅馬字"]
    reset: 0    # 0是半形,1是全形
#  - extended_charset
#    reset: 1    # 0是CJK基本ê字元集,1是CJK全部ê字元集
  - name: ascii_punct
#    states: ["全形羅馬字符號", "半形羅馬字符號"]
    reset: 1    # 0爲全形符號,1爲半形符號。
  - name: zh_tw
    reset: 1

engine: # 輸入法iăn-jín
  processors:   # 核心處理器
    - ascii_composer    # 處理英文模式kap華英文切換
    - recognizer    # Kap matcher 配,處理符合特定規則ê輸入碼,親像網址、反查等等ê tags。
    - key_binder    # Tī特定條件,將key pa̍k去其他ê key
    - speller   # 拼寫處理器,接受字元key,編輯輸入碼
    - punctuator    # 標點符號處理器,將孤1-ê字元key直接對應到標點符號a̍h是文字
    - selector  # 候選字選字、換頁
    - navigator # sòa beh插入去ê位置
#    - express_editor    # 編輯器,處理phah空格、enter key ē送去螢幕,處理bá-kuh key。
    - fluid_editor    # 句式編輯器,用來做空格斷詞、用enter key送去螢幕ê【注音】、【語句流】等等輸入方案,替換 express_editor。Mā ē-sái寫做 fluency_editor。

  segmentors:   # Hun段標記處理
    - ascii_segmentor   # 標記英文段落(譬喻tī英文模式),字母直接送去螢幕。
    - matcher   # 配合 recognizer 標記符合特定規則ê段落,親像網址、反查等等,ka特定ê tag。
    - abc_segmentor # 標記abc輸入碼《abc》ê類型
    - punct_segmentor   # 標記標點符號段落《punct》
    - fallback_segmentor    # 將輸入碼liâm做1段

  translators:  # 翻譯輸入ê編碼段,變做一組候選ê文字
    - punct_translator  # 配合 punct_segmentor 轉換標點符號
    - table_translator@custom_phrase    # 自訂語句
    - table_translator  # 碼表翻譯器,用tī倉頡、五筆等等,用碼表做基礎ê輸入方案。
    - script_translator # script翻譯器,用tī拼音、粵拼等等,用音節表做基礎ê輸入方案。

  filters:
    - uniquifier    # 過濾重複的候選字

menu:
  alternative_select_keys: "qwydfzxv"
  page_size: 8

speller:    # 拼寫運算詳解 https://github.com/rime/home/wiki/SpellingAlgebra
 alphabet: 'abceghijklmnoprstuABCEGHIJKLMNOPRSTU123456789'
 initials: 'abceghijklmnoprstuABCEGHIJKLMNOPRSTU'
 finals: "123456789"
 delimiter: " -"
 use_space: true
 auto_select: false

translator: # 翻譯特定類型ê編碼段,變做一組候選ê文字
  dictionary: taigi_pojhanlo    # 設定 table_translator 使用ê詞典名:xxx.dict.yaml
  initial_quality: 0

custom_phrase:
  dictionary: ""
  user_dict: taigi_pojhanlo.custom_phrase
  db_class: tabledb
  enable_completion: false
  enable_sentence: false
  initial_quality: 0
lotem commented 5 years ago

"missing metadata" means the dictionary wasn't successfully built. Read the INFO logs (lines starting with E are error logs) after deploying data to see if there's an error in the deployment process. When I try your code I get

E1107 10:18:53.584096 24928256 dict_compiler.cc:67] source file '/Users/tada/Library/Rime/taigi_pojhanlo.extended.dict.yaml' does not exist.
E1107 10:18:53.584460 24928256 deployment_tasks.cc:373] dictionary 'taigi_pojhanlo' failed to compile.

because I don't have the other file taigi_pojhanlo.extended.dict.yaml your taigi_pojhanlo.dict.yaml wants to import.

sih4sing5hong5 commented 5 years ago

Thank you a lot.

Our taigi_pojhanlo.extended.dict.yaml is empty now. We tried removing import_tables still failed.

$ cat taigi_pojhanlo.extended.dict.yaml 
# Rime dictionary
# encoding: utf-8
---
name: taigi.extended
version: "1.0.0"
sort: by_weight
use_preset_vocabulary: false
...

Another question: After tracing from deployment_tasks.cc, dict_compiler.cc, entry_controller.cc, table.cc to MappedFile, I still didn't find the code where rime does IO. The message we printed in EntryCollector::Collect. It seems that entry_controller parsed correctly

By the way, we're using behaved-driven development to write our schema.yaml and dict.yaml. There are our code and CI.