Closed ayaka14732 closed 3 years ago
Change data format, making use of QieyunEncoder v0.2.x. The new format is easier to maintain.
Sample build script:
Input: 廣韻(20170209).xls, preprocessed into .csv format, containing these columns
廣韻(20170209).xls
.csv
廣韻反切(覈校後),廣韻字頭(覈校後),廣韻釋義,釋義補充,聲紐,呼,等,韻部(調整後),聲調
Python script:
from QieyunEncoder import to描述 with open('src.csv') as f, open('data.csv', 'w') as g: # skip header next(f) for line in f: try: 反切, 字頭, 解釋, 補充, 母, 呼, 等, 韻, 聲 = line.rstrip('\n').split(',') except Exception: print(line) # 拆分重紐和韻 重紐 = 韻[1:] 韻 = 韻[:1] # 異體字 if 母 == '群': 母 = '羣' elif 母 == '娘': 母 = '孃' if 韻 == '真': 韻 = '眞' # 刪除羨餘屬性 if not (母 in '幫滂並明見溪羣疑影曉' and 韻 in '支脂祭眞仙宵清侵鹽'): 重紐 = None if 母 in '幫滂並明' or 韻 in '東冬鍾江虞模尤幽': 呼 = None # 無反切的小韻 if len(反切) != 2: 反切 = '' 描述 = to描述(母, 呼, 等, 重紐, 韻, 聲) print(描述, 反切, 字頭, 解釋, sep=',', file=g)
Change data format, making use of QieyunEncoder v0.2.x. The new format is easier to maintain.
Sample build script:
Input:
廣韻(20170209).xls
, preprocessed into.csv
format, containing these columnsPython script: