mirakc / mirakc-arib

mirakc-tools for Japanese TV broadcast contents
Apache License 2.0
24 stars 6 forks source link

Replace enclosed ideographic characters with non-ideographic character sequences #6

Closed masnagam closed 4 years ago

masnagam commented 4 years ago

For example, replace 🈞 (U+1F21E) with [再].

ts::ARIBCharset:: decode() doesn't replace enclosed ideographic characters like above. We need to reimplement ts::ARIBCharset:: decode() by using another library/method like LibISDB::ARIBStringDecoder::Decode.

masnagam commented 4 years ago

現状では,EPGStationの重複番組排除が機能していないことが報告されています. https://twitter.com/onoreorg/status/1292244976781295616

これは以下のようになっているためです. https://github.com/l3tnun/EPGStation/search?q=deleteBrackets&unscoped_q=deleteBrackets

現状でも,mirakcのconfig.jobs.update-schedules.commandに以下のようなシェルスクリプトを指定すれば囲み文字を置換することが可能です.

mirakc-arib collect-eits <options> | replace-enclosed-chars

mirakc-arib collect-eitsはEPG情報のJSONLを出力しますが,JSONのパーズは不要で,単に囲み文字を置換するだけでOKです.

masnagam commented 4 years ago

Fixed at f87be79f42015cbd63ae31aaa5ec513726dab4a6