請問四個 function 用途為何

shewer commented 4 years ago

很有趣想了解一下，通常只要有 dict and schema 就夠了 english_processor = english.processor english_segmentor = english.segmentor english_translator = english.translator english_filter = english.filterq

另-- filter0 ReverDB 應該放在 filter_init 中在方案中只開一次，井掛入env 以便於 filter 調用


function filter_func(input,env)
       env.reverdb:lookup( str) 
end 
function init_filter_func(env)
      env.reverdb= ReverDb("......")
end


local reverse_lookup_tab= { reverse = { init = init_filter_func, func = filter_func } , processor ={init=init_proc, func=processor} , translator = translator }

filter = rever_lookup_tab.reverse

sdadonkey commented 4 years ago

english_processor处理所有按鍵，尤其是字母，在西文模式下默認會被忽略直接上屏，必須攔截下來送到english_segmentor處理； english_segmentor則負責將字母統一轉爲小寫才發送到table-translator查詞，因爲字典已經棱鏡全部轉換爲小寫；english_translator暫時未用，占位備用；english_filter 則根據是主要負責顯示text和comment嘅換位、還原大小寫、通配符篩選等。

sdadonkey commented 4 years ago

你嘅建議我要測試一下，如果可行english_filter0應該可以簡化一下，不過我目前嘅詞典無論單詞還是注解都是唯一的，所以無需使用反查，只用到english_filter。

shewer commented 4 years ago

我的做法 lua 自䢖共用單字 lua table ，讓 lua_tranlator 查表 yield 候選字，但不曉得效𠅋是否良好或
利用 table_tranlator 單字-單字字典，便於單字上屏 lua filter 單獨自使用 lua_tab , 處理資料

lua_filter 調用單字 lua_table 查表 , 取得 dict 井對 dict 做 gsub 處理多次 yield (

local dict_table 
funtcion translator(input,seg,env)

   for  w,dic in pairs(dict_table)  do 
        if  w:lower:match(  "^" .. input .. ".*" )  then 
                     yield( Candidate("dict",seg.start,seg._end, input,dic )  )  
         end 
  end
end 

function filter(input,env)
     for  cand in .....
             if cand.type == "dict"  then 
                       yaild( ...  )
             end 
     end 
end 

 --   input

sdadonkey commented 4 years ago

你要明白，英文字典正常應該按字母順序排列而不是先單字後多字，否則就不是dictionary了，所以在yield前，table.sort 少不了。

sdadonkey commented 4 years ago

另外，你指的單字是指單個字母還是單詞？如果是單個字母你這樣處理就好處不大了。

sdadonkey commented 4 years ago

還有，我的輸入法中filter必須放在 uniquifier後，因爲注解展開後，很多text是空的，uniquifier會拿掉。

shewer commented 4 years ago

還有，我的輸入法中filter必須放在 uniquifier後，因爲注解展開後，很多text是空的，uniquifier會拿掉。

在後面

shewer commented 4 years ago

另外，你指的單字是指單個字母還是單詞？如果是單個字母你這樣處理就好處不大了。英文單字

先將 english.dict.yaml split 檔頭和字典 ( english.dict.head , english.txt)

#  合併  english.dict.head  and  english.txt
# gawk  移除 行首 #  輸出 單字 _單字   
( cat english.dict.head ;  gawk -e 'BEGIN{FS=OFS="\t"} /^#/ {next} $1 != "" { print $1,"_"$1}' english.txt |head ) > english.dict.yaml

dict.yaml word0 _word0 word1 _word1

--  english.lua
-- english.txt 放在 USERDIR 目録下
 USERDIR=os.getenv("APPDATA") or ""
 USERDIR= USERDIR .. "\\Rime"

function string:split ( sep)
     sep= sep or "%s"
        local t={}
        for str in self:gmatch "([^"..sep.."]+)") do
                table.insert(t, str)
        end
        return t
end

local function init_dict(filename)
    file_dict= io.open(USERDIR .. "\\" .. filename )
    local dict_table = {}
    for line in file_dict:lines() do 
        local word,info = table.unpack( line:split("\t") )
        dict_table[word] = info
    end 
    return dict_table
end 

local function filter(input , env)
        for  cand in input:itor() do 
            local info=env.dict[ cand.text:sub(2) ] -- sub(2) 去掉 第一字元
             local ar = conver_info(info) -- 將翻譯字串 處理 N 個字串 array
             for commit_text  in ipairs( ar ) do 
                  yield(     )  -- 
              end

        end 

end 

local function init_filter(env)
       env.dict= init_dict("english.txt")
end

shewer commented 4 years ago

另外，你指的單字是指單個字母還是單詞？如果是單個字母你這樣處理就好處不大了。是的。簡單的是利用 table_translator 協助上屏候選，麻煩的自䢖字典樹為了能和主要輸入法整合，此方案應為副方案為妥。

如加入 patterns english: "^E.+"
直接利用 table_translator ，這樣只要一個 lua_filter processor segment 都可省去

shewer commented 4 years ago

另外，你指的單字是指單個字母還是單詞？如果是單個字母你這樣處理就好處不大了。英文單字

先將 english.dict.yaml split 檔頭和字典 ( english.dict.head , english.txt)
#  合併  english.dict.head  and  english.txt
# gawk  移除 行首 #  輸出 單字 _單字   
( cat english.dict.head ;  gawk -e 'BEGIN{FS=OFS="\t"} /^#/ {next} $1 != "" { print $1,"_"$1}' english.txt |head ) > english.dict.yaml
dict.yaml word0 _word0 word1 _word1
--  english.lua
-- english.txt 放在 USERDIR 目録下
 USERDIR=os.getenv("APPDATA") or ""
 USERDIR= USERDIR .. "\\Rime"

function string:split ( sep)
     sep= sep or "%s"
        local t={}
        for str in self:gmatch "([^"..sep.."]+)") do
                table.insert(t, str)
        end
        return t
end

local function init_dict(filename)
  file_dict= io.open(USERDIR .. "\\" .. filename )
  local dict_table = {}
  for line in file_dict:lines() do 
      local word,info = table.unpack( line:split("\t") )
      dict_table[word] = info
  end 
  return dict_table
end 

local function filter(input , env)
        for  cand in input:itor() do 
          local info=env.dict[ cand.text:sub(2) ] -- sub(2) 去掉 第一字元
           local ar = conver_info(info) -- 將翻譯字串 處理 N 個字串 array
           for commit_text  in ipairs( ar ) do 
                yield(     )  -- 
            end

      end 

end 

local function init_filter(env)
       env.dict= init_dict("english.txt")
end 
另外，你指的單字是指單個字母還是單詞？如果是單個字母你這樣處理就好處不大了。是的。簡單的是利用 table_translator 協助上屏候選，麻煩的自䢖字典樹為了能和主要輸入法整合，此方案應為副方案為妥。

如加入 patterns english: "^E.+" 直接利用 table_translator ，這樣只要一個 lua_filter processor segment 都可省去

而且第一候選一定要是英文單字空白鍵上屏英文單字，後序還可增加開關在二三候選可改為中文字

sdadonkey commented 4 years ago

另外，你指的單字是指單個字母還是單詞？如果是單個字母你這樣處理就好處不大了。是的。簡單的是利用 table_translator 協助上屏候選，麻煩的自䢖字典樹為了能和主要輸入法整合，此方案應為副方案為妥。

如加入 patterns english: "^E.+" 直接利用 table_translator ，這樣只要一個 lua_filter processor segment 都可省去

用recognizer，要記太多patterns了，所以本方案直接利用西文模式下實現，與中文模式下的輸入法完全獨立。

sdadonkey commented 4 years ago

而且，如果不用segmentor將輸入碼變成小寫，字典文件就只能按https://github.com/BlindingDark/rime-easy-en一樣，一個單詞最少要準備三種寫法。

shewer commented 4 years ago

你要明白，英文字典正常應該按字母順序排列而不是先單字後多字，否則就不是dictionary了，所以在yield前，table.sort 少不了。 english.schema.yaml 產生 table.bin prism.bin ，是否可以簡化 table sort 的問題再加上 recognizer/patterns tag 轉至 english 因 english.table.bin 根本沒有或少有單字，所以先單字後多字的filter 會bypass 或者利用 tags: [ ] ，記得 filter 預設只收 tag: abc

主方案 custom.yaml
patch
schema/dependencies english

  recognizer/patterns:  "^E.*"

sdadonkey commented 4 years ago

我寫本方案的最大目的，是一邊打英文文章一邊看到提示單詞拼寫和註釋，而不是在輸中文時偶爾查一下單詞。也就是說，你enable本方案與否，也不應該影響你正常打英文文章。當然，爲了令user容易適應，現在enter鍵只是上屏候選字，沒有再多發一個換行鍵。

shewer commented 4 years ago

而且，如果不用segmentor將輸入碼變成小寫，字典文件就只能按https://github.com/BlindingDark/rime-easy-en一樣，一個單詞最少要準備三種寫法。

剛才看了 english.dict.yaml 幾乎沒有單字大小沒有重複的，即使有大小寫單字小寫輸入也能找出單字候選上屏

利用 speller 製作 prism.bin 應可 speller/algebra:

derive: /(.*)/\L$1/ --衍生〔保留原形〕增加全小寫
derive: /(.*)/\U$1/ -- 衍生〔保留原形〕增加全大寫

_AI AI --> _AI AI
_AI ai _Ai Ai --> Ai Ai Ai ai

_Alcatel Alcatel --> _Alcatel Alcatel _Alcatel alcatel

sdadonkey commented 4 years ago

但總有機會寫一些StudlyCaps或CamelCase單詞，難度就只能逐個字母輸入？尤其是寫程式時，如果要靠手動斷字，就達不到我“enable本方案與否，也不應該影響你正常打英文文章”的目標。

shewer commented 4 years ago

我寫本方案的最大目的，是一邊打英文文章一邊看到提示單詞拼寫和註釋，而不是在輸中文時偶爾查一下單詞。也就是說，你enable本方案與否，也不應該影響你正常打英文文章。當然，爲了令user容易適應，現在enter鍵只是上屏候選字，沒有再多發一個換行鍵。如果是以英文為主，就反過來主字典 english 收 tag:abc (唯一) 副字典全用 patterns luna_pinyin: "^P.*" 單字為先在英文 + pinyin
但彈性高及 lua 程式清爽

sdadonkey commented 4 years ago

如果是以英文為主，就反過來主字典 english 收 tag:abc (唯一) 副字典全用 patterns luna_pinyin: "^P.*" 單字為先在英文 + pinyin 但彈性高及 lua 程式清爽其實你可以試下按這個方向做，但估計相關需要解決的問題都不少，尤其是大小寫、標點符號、萬用鍵等。而且單詞後要自動加空格本身就要在processor中處理。

shewer commented 4 years ago

CamelCase 這裡有需要取決於 cand.preedit 送字還是 cand.text 送字也是在function filter 中決策
for cand in input:itor() do 
local cand_new= Candidate( "english",seg.start,seg._end, cand.preedit, info)
yield(cand_new)
end 

如果table.bin 有StudlyCaps CamelCase 單字打小寫也是會上屏候選空白鍵重碼就得選字了 , 那是 dict.yaml有沒有的問題如 _the the _The The

sdadonkey commented 4 years ago

先不說還有很多類似"WinXP" "AutoCAD" 等的單詞你不能提前考慮，就算只有全大寫、全小寫和首字母大寫三種情形，你的字典都會變成三倍大，你比較一下我方案和rime-easy-en方案兩個字典文件大小差異。

shewer commented 4 years ago

如果是以英文為主，就反過來主字典 english 收 tag:abc (唯一) 副字典全用 patterns luna_pinyin: "^P.*" 單字為先在英文 + pinyin 但彈性高及 lua 程式清爽其實你可以試下按這個方向做，但估計相關需要解決的問題都不少，尤其是大小寫、標點符號、萬用鍵等。而且單詞後要自動加空格本身就要在processor中處理。我是看到你的 schema 方案對翻譯有興趣其 schema 設計方式很難整合且 lua_processor 己完全攔截一般 ascii_mode 字母直上屏， ascii_composer 己無作用也就是在此方案中已無 ascii_mode 模式。 ascii-mode 只是檢查開關增加一開關 and key_binder

中打模式本來就與英打模式不同你的方案還是在中打模式下完善英打模式

䢖議試試 fluid_editor 句式編輯器，用於以空格斷詞、回車上屏的【注音】、【語句流】等輸入方案，替換express_editor 會不會更𩑎暢

sdadonkey commented 4 years ago

中打模式本來就與英打模式不同你的方案還是在中打模式下完善英打模式

䢖議試試 fluid_editor 句式編輯器，用於以空格斷詞、回車上屏的【注音】、【語句流】等輸入方案，替換express_editor 會不會更𩑎暢

我之前有想過，但估計難度較大，所以先完成express_editor版。

sdadonkey commented 4 years ago

我是看到你的 schema 方案對翻譯有興趣其 schema 設計方式很難整合且 lua_processor 己完全攔截一般 ascii_mode 字母直上屏， ascii_composer 己無作用也就是在此方案中已無 ascii_mode 模式。 ascii-mode 只是檢查開關增加一開關 and key_binder

沒有辦法，因爲rime作者沒有想到有人會爲西文模式下設計輸入法，所以ascii-mode 下所有按鍵都只是按默認處理。不發送給translator。

shewer commented 4 years ago

**

先不說還有很多類似"WinXP" "AutoCAD" 等的單詞你不能提前考慮，就算只有全大寫、全小寫和首字母大寫三種情形，你的字典都會變成三倍大，你比較一下我方案和rime-easy-en方案兩個字典文件大小差異。如果是以英打優先參考為輔我是用程式編輯器自動補齊功能

請問一下假設在此時按下TAB時將 preedit 才變成 AutoCAD 輸入 autocad"space" 你會要 autocad 上屏還是 AutoCAD Autocad"space" Autocad 上屏還是 AutoCAD autoc"space" autoc 上屏還是 AutoCAD Autoc"space" Autoc 上屏還是 AutoCAD AutoCAD comment ....... ........ automation .........

sdadonkey commented 4 years ago

**

先不說還有很多類似"WinXP" "AutoCAD" 等的單詞你不能提前考慮，就算只有全大寫、全小寫和首字母大寫三種情形，你的字典都會變成三倍大，你比較一下我方案和rime-easy-en方案兩個字典文件大小差異。如果是以英打優先參考為輔我是用程式編輯器自動補齊功能

請問一下假設在此時按下TAB時將 preedit 才變成 AutoCAD 輸入 autocad"space" 你會要 autocad 上屏還是 AutoCAD Autocad"space" Autocad 上屏還是 AutoCAD autoc"space" autoc 上屏還是 AutoCAD Autoc"space" Autoc 上屏還是 AutoCAD AutoCAD comment ....... ........ automation ......... 所有未輸入完成部份自動按前一個字母轉換大小寫。所以只需要輸入AutoC即見，另外使用wildcard 時也一樣，wildcard匹配部份按前一個字母決定大小寫。

shewer commented 4 years ago

剛剛翻了一下自個的 lib 也在 lua 下測試效率還不錯可以拿來做 lua_tranlator

dict_index -- metatable 單字詞 list dict_table -- metatable key= 單字詞 => 翻譯 match( str) -- return func for table:find_all table:each(print) dict_index:each(function(v,i) print( "word" ,v, "index",i) end )

table:find_all( func ) table:find_all( match("auto") ):each(print) table:find_all( function(elm) return elm:lower():match( "^" .. ("auto"):lower() ) end ) table:find_all( function(elm,str) return elm:lower():match( "^" .. str:lower() ) end ,"auto")

function string.split( str, sp,sp1)
        if   type(sp) == "string"  then
                if sp:len() == 0 then
                        sp= "([%z\1-\127\194-\244][\128-\191]*)"
                elseif sp:len() > 1 then
                        sp1= sp1 or "^"
                        _,str= pcall(string.gsub,str ,sp,sp1)
                        sp=  "[^".. sp1.. "]*"

                else
                        if sp =="%" then
                                sp= "%%"
                        end
                        sp=  "[^" .. sp  .. "]*"
                end
        else
                sp= "[^" .. " " .."]+"
        end

        local tab= setmetatable( {} , {__index=table} )
        flag,res= pcall( string.gmatch,str,sp)
        for  v  in res   do
                tab:insert(v)
        end
        return tab
end

table.eacha=function(tab,func)
        for i,v in ipairs(tab) do
                func(v,i)
        end
        return tab
end
table.find_all=function(tab,elm,...)
        local tmptab=setmetatable({} , {__index=table} )
        local _func=  (type(elm) == "function" and elm ) or  function(v,k, ... ) return  v == elm  end
        for k,v in pairs(tab) do
                if _func(v,k,...) then
                        tmptab:insert(v)
                end
        end
        return tmptab
end

-- test 
dict_file= io.open("english.txt")
dict_index=setmetatable({},{__index=table})
dict_info=setmetatable({},{__index=table})
for line in dict_file:lines() do 
     local word,info=line:split("\t"):unpack()
     dict_info[word]=info
     dict_index:insert(word)
end

function match(str)  --  return function for table:find_all(func)
     return function(elm_str) 
         return  elm_str:lower():match(  "^" .. str:lower() )   
     end
end 

dict_index:find_all(match("a") ):find_all(match("au")):each(print)

function dict_match(tab,input) 
   for i=1,#input do
      local substr= input:sub(1,i)  
      tab=tab:find_all( match( substr  )  )
      print(substr, #tab)
   end
   return tab
end

dict_match( dict_index, "auto") 
dict_match( dict_index , "auto.*tion")
t1=os.clock() ; dict_match( dict_index , "auto.*tion") ; print("runtime:" , os.clock() - t1 )
--[[ a       3506
au      190
aut     83
auto    53
auto.   52
auto.*  52
auto.*t 23
auto.*ti        11
auto.*tio       2
auto.*tion      2
runtime:        0.073837999999999
--]] 

t1=os.clock() ; tab=dict_match( dict_index , "austr.*") ; print("runtime:" , os.clock() - t1 ) ; tab:each(print)

shewer commented 4 years ago

剛剛翻了一下自個的 lib 也在 lua 下測試效率還不錯可以拿來做 lua_tranlator

dict_index -- metatable 單字詞 list dict_table -- metatable key= 單字詞 => 翻譯 match( str) -- return func for table:find_all table:each(print) dict_index:each(function(v,i) print( "word" ,v, "index",i) end )

table:find_all( func ) table:find_all( match("auto") ):each(print) table:find_all( function(elm) return elm:lower():match( "^" .. ("auto"):lower() ) end ) table:find_all( function(elm,str) return elm:lower():match( "^" .. str:lower() ) end ,"auto")

function string.split( str, sp,sp1)
        if   type(sp) == "string"  then
                if sp:len() == 0 then
                        sp= "([%z\1-\127\194-\244][\128-\191]*)"
                elseif sp:len() > 1 then
                        sp1= sp1 or "^"
                        _,str= pcall(string.gsub,str ,sp,sp1)
                        sp=  "[^".. sp1.. "]*"

                else
                        if sp =="%" then
                                sp= "%%"
                        end
                        sp=  "[^" .. sp  .. "]*"
                end
        else
                sp= "[^" .. " " .."]+"
        end

        local tab= setmetatable( {} , {__index=table} )
        flag,res= pcall( string.gmatch,str,sp)
        for  v  in res   do
                tab:insert(v)
        end
        return tab
end

table.each=function(tab,func)  -- fix eacha   --> each
        for i,v in ipairs(tab) do
                func(v,i)
        end
        return tab
end
table.find_all=function(tab,elm,...)
        local tmptab=setmetatable({} , {__index=table} )
        local _func=  (type(elm) == "function" and elm ) or  function(v,k, ... ) return  v == elm  end
        for k,v in pairs(tab) do
                if _func(v,k,...) then
                        tmptab:insert(v)
                end
        end
        return tmptab
end

-- test 
dict_file= io.open("english.txt")
dict_index=setmetatable({},{__index=table})
dict_info=setmetatable({},{__index=table})
for line in dict_file:lines() do 
     local word,info=line:split("\t"):unpack()
     dict_info[word]=info
     dict_index:insert(word)
end

function match(str)  --  return function for table:find_all(func)
     return function(elm_str) 
         return  elm_str:lower():match(  "^" .. str:lower() )   
     end
end 

dict_index:find_all(match("a") ):find_all(match("au")):each(print)

function dict_match(tab,input) 
   for i=1,#input do
      local substr= input:sub(1,i)  
      tab=tab:find_all( match( substr  )  )
      print(substr, #tab)
   end
   return tab
end

dict_match( dict_index, "auto") 
dict_match( dict_index , "auto.*tion")
t1=os.clock() ; dict_match( dict_index , "auto.*tion") ; print("runtime:" , os.clock() - t1 )
--[[ a       3506
au      190
aut     83
auto    53
auto.   52
auto.*  52
auto.*t 23
auto.*ti        11
auto.*tio       2
auto.*tion      2
runtime:        0.073837999999999
--]] 

t1=os.clock() ; tab=dict_match( dict_index , "austr.*") ; print("runtime:" , os.clock() - t1 ) ; tab:each(print)

再用 each 丟入 yield tab:each( functio(elm) Candate( ...... erm ) end )

filter 用 dict_info[ cand.text] 調出 info

sdadonkey commented 4 years ago

其實這樣也好，自己做字典，不按Rime的機制產生候選項，免得又要重新排序。

shewer commented 4 years ago

其實這樣也好，自己做字典，不按Rime的機制產生候選項，免得又要重新排序。是的在 translator( key ,env) 在 yield 時可以加上 cand.type= "english" 在 filter(input,env) 利用 cand.type == "english"

架構有了就差 match( string ) string ' ~ 置換 string:match pattern 也可在線外驗證

shewer commented 4 years ago

其實這樣也好，自己做字典，不按Rime的機制產生候選項，免得又要重新排序。

OK 可行
在沒有 lua_processor 就如同中文輸入法模式，但是偶會發生關閉當前程式
fluid_editor 模式沒有差別，一樣空白鍵只有上屏的功能，還是需要 lua_processor 攔截key

ascii_composer
lua_processor@english_processor -- 另做一個開關進入 lua_translator
recognizer

shewer commented 4 years ago

一直搞不清楚 segmentor 運作，在你的程式中有用到此機制如果我加入 lua_processor 收下key 時可以指定 lua_translator 處理 preedit text嗎我測試的方案是將 luna_pinyin 設定為主字典，在 lua_processor 中判定模式如果是英打模式就把 input 指定到 lua_translator 這樣可排開 tag: abc

另外程式中有用到 Composition 此物件作用為何 --> 掛在 env.context.composition :back() 有何作用 Segmentation 此物件作用為何 --> lua_segmentor( Segmentation * segmentation,env) :back() 有何作用

謝謝

sdadonkey commented 4 years ago

lua_processor

segmentor 我理解就是負責爲context.input 分段（segment）並打上tag，ascii_mode下，所有輸入都會被ascii_segmentor打上raw。因爲我的程式中要將所有輸入字符轉換爲小寫才能夠被table_translator所翻譯，但在segmentor中修改context.input會陷入死循環，但改segmentation.input就不會dead lock。 librime-lua現在的版本沒有提供接口增加或修改tag，所以你不能夠爲key 指定lua_translator 。 composition:back()和segmentation:back()都是返回Segment object。

sdadonkey commented 4 years ago

dict_file= io.open("english.txt")

文件不指定路徑可以嗎？

shewer commented 4 years ago

dict_file= io.open("english.txt")

文件不指定路徑可以嗎？不行且 lua 沒有提供接口，且runtime時工作路逕在 rime 程式區

我是利用 command 內部預設環境變數取得 APPDATA=C:\Users\shewe\AppData\Roaming + "\Rime"
USERDIR= os.get_env("APPDATA") .. + "\Rime" dict_file= io.open( USERDIR .. "\" .. filename)


 local function init_dict( filename)
     local dict_file= io.open( USERDIR .. "\\" .. filename)
     local dict_index=setmetatable({},{__index=table})
     local dict_info=setmetatable({},{__index=table})
     for line in dict_file:lines() do
         local word,info = line:split("\t"):unpack()
         dict_info[word]=info
         dict_index:insert(word)
     end
     return dict_index,dict_info

 end

local function english()
    local dict_index,dict_info= init_dict("english.txt")
-- .....

end

我的 lua source 你參考一下， english_init 提供加載 string:split table:each table:find_all ， english.lua 載仆 lua_translator and lua_filter

-- rime.lua 
USERDIR=os.getenv("APPDATA") or "" 
USERDIR= USERDIR .. "\\Rime"

local english = require("english")()
--english_processor = english.processor
--english_segmentor = english.segmentor
english_translator = english.translator
english_filter = english.filter
---   rime.lua  EOF

-- english_init.lua 
#! /usr/bin/env lua
--
-- english_init.lua
-- Copyright (C) 2020 Shewer Lu <shewer@gmail.com>
--
-- Distributed under terms of the MIT license.
--
--

function string.split( str, sp,sp1)
        if   type(sp) == "string"  then
                if sp:len() == 0 then
                        sp= "([%z\1-\127\194-\244][\128-\191]*)"
                elseif sp:len() > 1 then
                        sp1= sp1 or "^"
                        _,str= pcall(string.gsub,str ,sp,sp1)
                        sp=  "[^".. sp1.. "]*"

                else
                        if sp =="%" then
                                sp= "%%"
                        end
                        sp=  "[^" .. sp  .. "]*"
                end
        else
                sp= "[^" .. " " .."]+"
        end

        local tab= setmetatable( {} , {__index=table} )
        flag,res= pcall( string.gmatch,str,sp)
        for  v  in res   do
                tab:insert(v)
        end
        return tab
end

table.each=function(tab,func)
        for i,v in ipairs(tab) do
                func(v,i)
        end
        return tab
end
table.find_all=function(tab,elm,...)
        local tmptab=setmetatable({} , {__index=table} )
        local _func=  (type(elm) == "function" and elm ) or  function(v,k, ... ) return  v == elm  end
        for k,v in pairs(tab) do
                if _func(v,k,...) then
                        tmptab:insert(v)
                end
        end
        return tmptab
end
-- english_init.lua EOF

-- english.lua 
#! /usr/bin/env lua
--
-- english.lua
-- Copyright (C) 2020 Shewer Lu <shewer@gmail.com>
--
-- Distributed under terms of the MIT license.
--

require "english_init"

local function match( str )
    return function(elm)
        return   elm:lower():match( "^" .. str:lower() ) 
    end 
end 

local function init_dict( filename) 
    local dict_file= io.open( USERDIR .. "\\" .. filename)
    local dict_index=setmetatable({},{__index=table})
    local dict_info=setmetatable({},{__index=table})
    for line in dict_file:lines() do 
        local word,info = line:split("\t"):unpack()
        dict_info[word]=info
        dict_index:insert(word)
    end 
    return dict_index,dict_info

end 

local function dict_match(tab, str)
    for i=1,#str do 
        local substr= str:sub(1,i) 
        tab=tab:find_all( match( substr ) )
    end 
    return tab

local function lua_init()

    local dict_index,dict_info= init_dict( "./english.txt") 
    local function processor_func(key,env) -- key:KeyEvent,env_

    end 

    local function processor_init_func(env)
    end 
    local function processor_fini_func(env)
    end 

-- lua segmentor
    local function segmentor_func(segmentation,env) -- segmetation:Segmentation,env_
    end
    local function segmentor_init_func(env)
    end 
    local function segmentor_fini_func(env)
    end 
-- lua translator 
    local function translator_func(input,seg,env)  -- input:string, seg:Segment, env_
        local tab=env.dict_index
        dict_match( tab , input):each( function(elm) 
            yield( Candidate("english", seg.start,seg._end, elm, "[english]") )
        end 
        )
    end 

    local function translator_init_func(env)
        env.dict_index= dict_index
    end 
    local function translator_fini_func(env)
    end 

-- lua filter

--  cand data to string 
    local function candinfo_func(cand,env,option)
        if option then 
            return  string.format("|t:%s s:%s e:%s q:%6.3f,p:%s,ns:%s|",
            cand.type,cand.start,cand._end,cand.quality,cand.preedit,env.namespace)
        else 
            return ""
        end 
    end 
    local function filter_func(input,env)  -- input:Tranlation , env_

        for cand in  input:iter() do 

            if cand.type== "english" then 
                cand.comment=  env.dict[cand.text] 
            end 
            --cand.comment= cand.comment .. "--" .. candinfo_func(cand,env,true) 
            yield(cand)
        end 
    end 

    local function filter_init_func(env)
        env.dict= dict_info
    end 
    local function filter_fini_func(env)
    end 

    return { 
        --processor= { func=processor_func,   init=processor_init_func, fini=nil } , 
        --segmetor=  { func=segmetor_func,  init=segmetor_init_func , fini=nil} , 
        translator={ func=translator_func,init=translator_init_func,fini=nil } , 
        filter=    { func=filter_func ,   init=filter_init_func,    fini=nil } ,   
    }

end 

return lua_init

shewer commented 4 years ago

那麼在proccessor 收下 key ，要如何傳到指定 segmentor
segmentor 內將 segmention 設定 tag 我知道 lua_translator 可以設定 tag

lua_translator@english

:english
    tag: english

增加一個 english 開關這樣就完善英文輸 + 中文輸入 + english 輸入三個模式

lua-english 就可以加載到各方案

english 模組想法 lua_processor@english 在 english on 時，收下 [a-zA-Z?*] --> segmentor --> english_translator "space" 上屏 preedit 且補上 "space" , . 上屏 preedit 且補上 , . "\t" 更新目前選單字串至 preedit.text -- 備份 preedit.text use table.insert(predit.text) "shift-tab" 取上一個 preedit

這樣的設計在 english 模式下如同 ascii-mode + 字典顥示

1234567890 還是用 process/selector and navigator selector 選字處理器，處理數字選字鍵〔可以換成別的哦〕、上、下候選定位、換頁 navigator 處理輸入欄內的光標移動

例: autocad" " --> autocad" " autocad"1 " --> autoCAD" " autoca"\t" --> (preedit.text)= autoCAD autocad. --> autocad. autoca"\t. " --> autoCAD." "
autoca"\t" --> (preedit.text) = autoCAD"shift-table" --> (preedit.text) = autoca

sdadonkey commented 4 years ago

那麼在proccessor 收下 key ，要如何傳到指定 segmentor

segmentor不能被指定，不同的segmentor負責爲符合條件的segment打tag，例如ascii_segmentor打raw，abc_segmentor打abc，punct_segmentor打punct，最零活的是matcher，會根據recognizer結果打你給定的tag。然後translators如果有匹配的tag（沒有指明就是abc）translate對應的segment。你可以用類似segment:has_tag("abc")測試一下。

shewer commented 4 years ago

那麼在proccessor 收下 key ，要如何傳到指定 segmentor

segmentor不能被指定，不同的segmentor負責爲符合條件的segment打tag，例如ascii_segmentor打raw，abc_segmentor打abc，punct_segmentor打punct，最零活的是matcher，會根據recognizer結果打你給定的tag。然後translators如果有匹配的tag（沒有指明就是abc）translate對應的segment。你可以用類似segment:has_tag("abc")測試一下。

查看 lua api 找不到設定 tag 的方法，再不行還真得用 tag: raw 的方案那麼 ascii_composer/switch_key/Caps_lock 你的設置習慣是
commit_code / commit_text /noop / clear processors/ascii_composer 和 english 有衝突

shewer commented 4 years ago

剛剛翻了一下自個的 lib 也在 lua 下測試效率還不錯可以拿來做 lua_tranlator

dict_index -- metatable 單字詞 list dict_table -- metatable key= 單字詞 => 翻譯 match( str) -- return func for table:find_all table:each(print) dict_index:each(function(v,i) print( "word" ,v, "index",i) end )

table:find_all( func ) table:find_all( match("auto") ):each(print) table:find_all( function(elm) return elm:lower():match( "^" .. ("auto"):lower() ) end ) table:find_all( function(elm,str) return elm:lower():match( "^" .. str:lower() ) end ,"auto")

function string.split( str, sp,sp1)
        if   type(sp) == "string"  then
                if sp:len() == 0 then
                        sp= "([%z\1-\127\194-\244][\128-\191]*)"
                elseif sp:len() > 1 then
                        sp1= sp1 or "^"
                        _,str= pcall(string.gsub,str ,sp,sp1)
                        sp=  "[^".. sp1.. "]*"

                else
                        if sp =="%" then
                                sp= "%%"
                        end
                        sp=  "[^" .. sp  .. "]*"
                end
        else
                sp= "[^" .. " " .."]+"
        end

        local tab= setmetatable( {} , {__index=table} )
        flag,res= pcall( string.gmatch,str,sp)
        for  v  in res   do
                tab:insert(v)
        end
        return tab
end

table.eacha=function(tab,func)
        for i,v in ipairs(tab) do
                func(v,i)
        end
        return tab
end
table.find_all=function(tab,elm,...)
        local tmptab=setmetatable({} , {__index=table} )
        local _func=  (type(elm) == "function" and elm ) or  function(v,k, ... ) return  v == elm  end
        for k,v in pairs(tab) do
                if _func(v,k,...) then
                        tmptab:insert(v)
                end
        end
        return tmptab
end

-- test 
dict_file= io.open("english.txt")
dict_index=setmetatable({},{__index=table})
dict_info=setmetatable({},{__index=table})
for line in dict_file:lines() do 
     local word,info=line:split("\t"):unpack()
     dict_info[word]=info
     dict_index:insert(word)
end

function match(str)  --  return function for table:find_all(func)
     return function(elm_str) 
         return  elm_str:lower():match(  "^" .. str:lower() )   
     end
end 

dict_index:find_all(match("a") ):find_all(match("au")):each(print)

function dict_match(tab,input) 
   for i=1,#input do
      local substr= input:sub(1,i)  
      tab=tab:find_all( match( substr  )  )
      print(substr, #tab)
   end
   return tab
end

dict_match( dict_index, "auto") 
dict_match( dict_index , "auto.*tion")
t1=os.clock() ; dict_match( dict_index , "auto.*tion") ; print("runtime:" , os.clock() - t1 )
--[[ a       3506
au      190
aut     83
auto    53
auto.   52
auto.*  52
auto.*t 23
auto.*ti        11
auto.*tio       2
auto.*tion      2
runtime:        0.073837999999999
--]] 

t1=os.clock() ; tab=dict_match( dict_index , "austr.*") ; print("runtime:" , os.clock() - t1 ) ; tab:each(print)

新的字典查找提速至 10ms 以下

將字典 dict_index 第一字母切 26 個tabel , 但是第一字母不可用 * ? ，可以加判斷查26個表

增加 wildfmt(str) match pattern 轉換 a?le a*able


local function wildfmt(str)  --    replace ?* to pattern    ? => [%a._]?   *=> [%a._]*   and  add  "$"    %a [a-zA-Z]
    local change
    str,chaneg= str:gsub("([?*])","[%%a._]%1")
    if change ~= 0 then
        str=   str .. "$"
    end
    return "^" .. str:lower()
end

local function init_dict( filename) --local dict_file= io.open( USERDIR .. "\" .. filename) local dict_file= io.open( ( USERDIR .. "/" .. filename) ) local dict_index=setmetatable({},{index=table}) local dict_info=setmetatable({},{__index=table}) for i=0x61,0x7a do dict_index[string.char(i)] = setmetatable({},{index=table}) end for line in dict_file:lines() do if not line:match("^#") then local word,info = line:split("\t"):unpack() dict_info[word]=info dict_index[word:sub(1,1):lower() ]:insert(word) --dict_index:insert(word) end end return dict_index,dict_info

end

local function dict_match(tab, str) tab=tab[str:sub(1,1):lower()]

 str=wildfmt(str)
 tab = tab:find_all(function(elm)
        return   elm:lower():match( str )
        end )
return tab

end

-- test dict_index,dict_info= dict_init( "./english.txt") function test(str) local t1= os.clock() local tab=ditc_mach( dict_index, str) print( "count match:" , #tab, "runtime:", os.clock() - t1 ) return tab end tab=test( "ab*tion") --count match: 16 runtime: 0.0052119999999998 --table: 0x56438d39bac0

--[[ tab:each( function(elm) print( elm, dict_info[elm] ) end ) count match: 16 runtime: 0.0045699999999997 abbreviation [ә.bri:vi'eiʃәn] n. 缩写词, 缩写, 缩短, 节略 abdication [.æbdi'keiʃәn] n. 逊位, 弃权, 辞职 abduction [æb'dʌkʃәn] n. 诱拐, 绑架, 外展\n[医] 外展, 展 aberration [æbә'reiʃәn] n. 离开正路, 偏离, 畸变, 光行差, 心理失常, 色差\n[化] 光行差; 像差 abjection [æb'dʒekʃәn] n. 卑鄙, 落魄, 抛弃, 驱逐 ablation [æb'leiʃәn] n. 腐蚀, 切除, 烧蚀\n[化] 烧蚀 ablution [ә'blu:ʃәn] n. 洗澡, 洗礼, 斋戒沐浴\n[化] 洗净; 洗净液 abnegation [.æbni'geiʃәn] n. 放弃, 舍弃, 克制 abolition [.æbәu'liʃәn] n. 废除, 废奴运动\n[医] 禁止, 消失 abomination [ә.bɒmi'neiʃәn] n. 厌恶, 痛恶, 令人厌恶的事物 abortion [ә'bɒ:ʃәn] n. 流产, 堕胎, 失败, 夭折, 中止\n[医] 流产, 小产; 顿挫 abrogation [.æbrәu'geiʃәn] n. 废除, 取消\n[经] 废除, 取消 absolution [æbsә'lu:ʃәn] n. 免罪, 赦免, 免除\n[法] 免罚, 赦免, 免除 absorption [әb'sɒ:pʃәn] n. 吸收, 专心, 全神贯注\n[化] 吸收; 吸收作用 abstention [әb'stenʃәn] n. 戒绝, 回避, 弃权 abstraction [æb'strækʃәn] n. 抽象化, 心不在焉, 空想, 提炼, 抽象派作品\n[化] 提取; 抽取; 夺取; 夺取反应; 除去

--]]

sdadonkey commented 4 years ago

方法非常好，我打算在新版本中采用。

shewer commented 4 years ago

方法非常好，我打算在新版本中采用剛才重整發現可以這用把 english 所有檔放在lua/english/ 英文字典和 lua 放在一起就不用路逕了
--- rime.lua
local english = require("english")()
english_processor = english.processor
english_segmentor = english.segmentor
english_translator = english.translator
english_filter = english.filter

--- lua/inglish/init.lua -- start require 'englinsh_init' -- function table.each(self,func) tab.find_allself,func) , string.split(self, pattern) string.find_words,string.word_info= require("english_dict")() -- 掛入 string table -- test string.find_words(self) return words table string.word_info(self) reverse-lookup [[ str="auto" str:find_word():each(function(elm) print(elm, elm:word_info() ) end ) -- 等於上例

]]

local function()

return {
    processor= { func=processor_func, init=processor_init_func, fini=nil } ,
    segmentor= { func= segmentor_func, init=segmentor_init_func , fini=nil} ,
    translator={ func=translator_func, init=translator_init_func,fini=nil } ,
    filter=    { func=filter_func, init=filter_init_func,    fini=nil } ,
    dict_match= function(str,step) return dict_match(dict_index,str,step) end ,
    dict_info= dict_info,
}

end

return lua_init ----- init.lua --end

---lua/english/english_dict.lua -- ........ local function init(filename)

filename= filename or "./english.txt"
local dict_index,dict_info = init_dict(filename)

local function words(str)
    return dict_match(dict_index,str)
end
local function info(str)
    return dict_info[str]
end
local function unload()
    package.loaded["english_dict"]=nil
end

return words,info ,unload

end return init --- english_dict end

sdadonkey commented 4 years ago

很好。

shewer commented 4 years ago

這樣完全不用全搜 memu_size 滿了就暫停剩下的換頁 select edit 都用 rime 的component

    local function _iter_match_func(tab, str, func ) -- table 是單字母 字典
        local iter,tab,index = ipairs(tab)
        return function()
            for i,v in iter ,tab, index do
                index = i  -- keep index for next start from index+1
                if  v:lower():match( str ) then
                    return func(v)
                end
            end
            return nil
        end
    end

 function lua_translator(input,seg,env) 
       local function getinfo(text)     return text,comment  end 

      for  text,comment in iter_match_func( a-tab,"auto" , getinfo ) do
                  if fold_sw then 
                         comment:split("\\n"):each ( function(elm) yaild( Canddate(..)) end )
                   else  yaild(Canddate(.....) )  end 

       end 

end

sdadonkey / rime-english

請問四個 function 用途為何 #1

sdadonkey / rime-english

請問 四個 function 用途為何 #1

請問四個 function 用途為何 #1