i3thuan5 / SuiSiann-KauTui

台灣媠聲標記網站
12 stars 3 forks source link

口語調ê空白變做 `\xa0`,tī BDD內底用工具對齊失敗 #362

Closed niauah closed 2 years ago

niauah commented 2 years ago

可能是TinyMCE新版本問題。

Tī docker ê python manage.py shell 內底試:

>>> obj.羅馬字含口語調
'<p><span class="lui-1">To</span>-<span class="lui-8">guân</span>\xa0<span class="lui-8">bûn</span>-<span class="lui-2">huà</span>\xa0<span class="lui-4">berh</span>\xa0<span class="lui-1">siū</span>\xa0<span class="lui-8">lâng</span>\xa0<span class="lui-1">khíng</span>-<span class="lui-2">tīng</span>,\xa0<span class="lui-1">gír</span>-<span class="lui-1">giân</span>\xa0<span class="lui-1">tīr</span>\xa0<span class="lui-1">kong</span>-<span class="lui-1">líng</span>-<span class="lui-2">hi̍k</span>\xa0<span class="lui-8">ê</span>\xa0<span class="lui-8">thîng</span>-<span class="lui-2">hiān</span>\xa0<span class="lui-1">sī</span>\xa0<span class="lui-1">tsin</span>\xa0<span class="lui-1">tiōng</span>-<span class="lui-2">iàu</span>\xa0<span class="lui-8">ê</span> khai-<span class="lui-2">sí</span>.</p>'
>>> obj.羅馬字
'To-guân\xa0bûn-huà\xa0berh\xa0siū\xa0lâng\xa0khíng-tīng,\xa0gír-giân\xa0tīr\xa0kong-líng-hi̍k\xa0ê\xa0thîng-hiān\xa0sī\xa0tsin\xa0tiōng-iàu\xa0ê khai-sí.'
>>> from tuitse import kiamtsa
>>> kiamtsa(obj.漢字, obj.羅馬字)
[('多', 'To', 1, True), ('元', 'guân', 2, True), ('文', 'bûn', 1, True), ('化', 'huà', 2, True), ('欲', 'berh', 1, True), ('受', 'siū', 1, True), ('人', 'lâng', 1, True), ('肯', 'khíng', 1, True), ('定', 'tīng', 2, True), (',', ',', 1, True), ('語', 'gír', 1, True), ('言', 'giân', 2, True), ('佇', 'tīr', 1, True), ('公', 'kong', 1, True), ('領', 'líng', 2, True), ('域', 'hi̍k', 2, True), ('的', 'ê', 1, True), ('呈', 'thîng', 1, True), ('現', 'hiān', 2, True), ('是', 'sī', 1, True), ('真', 'tsin', 1, True), ('重', 'tiōng', 1, True), ('要', 'iàu', 2, True), ('的', 'ê', 1, True), ('開', 'khai', 1, True), ('始', 'sí', 2, True), ('。', '.', 1, True)]
>>> 檢查對齊狀態(obj.漢字, get_lomaji(clean_html(obj.羅馬字含口語調)), clean_html(obj.羅馬字含口語調))
'khai 標記錯誤'

有法度對齊。

走BDD結果:

  Scenario Outline: Kiám html hó-sè. -- @1.2 標記正確                                                                          # phiaua/features/clean_khaugitiau.feature:73
    When 有一句 真重要 <p><span class="lui-1">tsin</span>\xa0<span class="lui-1">tiōng</span>-<span class="lui-2">iàu</span></p> # phiaua/features/steps/khaugitiau.steps.py:16 0.001s
    Then 無顯示錯誤                                                                                                             # phiaua/features/steps/khaugitiau.steps.py:28 0.000s
      Assertion Failed: '詞內底的型、音bô平長' != ''
      - 詞內底的型、音bô平長
      +

kā kāng款ê漢字、羅馬字khǹg入去shell走:

>>> khaugi = '<p><span class="lui-1">tsin</span>\xa0<span class="lui-1">tiōng</span>-<span class="lui-2">iàu</span></p>'
>>> 檢查對齊狀態('真重要', get_lomaji(clean_html(khaugi)), clean_html(khaugi))
''

mā bē對齊失敗。

niauah commented 2 years ago

04e183e 解決

niauah commented 2 years ago

錯誤原因是 \x hō͘ escape--去

  Scenario Outline: Kiám html hó-sè. -- @1.2 標記正確                                                                                # phiaua/features/clean_khaugitiau.feature:76
    Given 有一句錄音                                                                                                                  # phiaua/features/steps/khaugitiau.steps.py:7 0.001s
    When 漢字是 真重要 ,口語調是 <p><span class="lui-1">tsin</span>\xa0<span class="lui-1">tiōng</span>-<span class="lui-2">iàu</span></p> # phiaua/features/steps/khaugitiau.steps.py:25 0.001s
    Then 無顯示錯誤                                                                                                                   # phiaua/features/steps/khaugitiau.steps.py:38 0.000s
      Assertion Failed: '詞內底的型、音bô平長' != ''
      - 詞內底的型、音bô平長
      + 
       : ('真重要', 'tsin\\xa0tiōng-iàu')
>>> print('To-guân\xa0bûn-huà\xa0berh\xa0siū\xa0lâng\xa0khíng-tīng,')
To-guân bûn-huà berh siū lâng khíng-tīng,