Add proactive and inactive detective mode

yujinlin0224 commented 2 years ago

自動轉換的偵測式在非中文語言頁面仍然做了轉換，例如日文和英文網站，裡面如果有中文字元，一樣會被轉換，但這可能不是那麼的合理。查了原始碼發現在src/background/runtime/handle-get-auto-convert.ts中，對於非中文語言頁面會直接使用目標中文來轉換，而不是直接忽略，這樣子所謂「偵測式」可能意義不大。

要解決這問題，是不是把

      case ZhType.und:
        return target;

改成

      case ZhType.und:
        return undefined;

就能解決了呢？或是額外開新的模式，只針對中文語言頁面做自動偵測轉換

我以前也有提過相關問題：https://github.com/tongwentang/New-Tongwentang-for-Firefox/issues/38

t7yang commented 2 years ago

瀏覽器偵測不到目前的語言，套件能做的就是採取激進或消極的策略，兩者都不可能符合全部人的需求。頂多是未來把激進跟消極策略做成選項，讓使用者自己選擇。

yujinlin0224 commented 2 years ago

我知道，瀏覽器能做的頂多只能透過<html>的lang屬性來判斷，但也不是100%正確需要看網頁開發者是否用心，但如果能開放單純判斷<html>的lang屬性只有zh開頭時才自動轉換而不fallback的功能是最好了，讓使用者有更多選擇的空間。

t7yang commented 2 years ago

套件本身並沒有介入語言的判斷而是直接呼叫瀏覽器提供的 API 。之後會考慮新增「激進」跟「保守」的選項（如上一則留言所述）。

uttchen commented 2 years ago

請問一下，好像更新之後會把英文的撇號 ' 自動轉換成中文的下引號』不知道有沒有關聯，這個有辦法改掉嗎？

t7yang commented 2 years ago

@uttchen 並不是英文的單引號，而是中文的單引號（嚴格說是中國的單音號轉換成台灣的單引號）。目前並沒有針對內建字典檔開放讓使用者選擇，未來應該會做。

yujinlin0224 commented 2 years ago

@uttchen 並不是英文的單引號，而是中文的單引號（嚴格說是中國的單音號轉換成台灣的單引號）

實際上在英文出版物上，也會使用‘ (U+2018)、’ (U+2019)當作引號，和中國的引號共用字元，部分英文網站也會使用此類字元，達到較好的顯示效果

uttchen commented 2 years ago

@t7yang @yujinlin0224

我不確定你的意思，我現在遇到的情況是像這樣左邊是裝了更新後的插件的 firefox，右邊是沒裝的 chrome

所有的英文引號都被換成了中文

另外，英文撇號（apostrophe）也會有誤轉

這是我在更新前沒遇過的情況

t7yang commented 2 years ago

確實沒考慮到這點，因為我本身沒有用自動轉換，所以沒有察覺。

不過這個屬於字典檔的部分，請到 https://github.com/tongwentang/tongwen-dict/issues 開 issue 把發現到有問題的標點符號列出來，會把有問題的標點符號先註解掉

之後再考慮實作針對網站或語系來套用標點符號的字典檔。

alabamagan commented 2 years ago

Hi, I think a good idea to do that is to add a simple neighbor check when converting these symbols. Obviously, you want to convert only when a Chinese character is the immediate neighbor.

t7yang commented 2 years ago

what if an English article quoted a Chinese sentence

English... “這是中文” English...

the quote symbols are live beside Chinese character

instead, maybe let the user set what dict to apply on the site:

之後再考慮實作針對網站或語系來套用標點符號的字典檔

alabamagan commented 2 years ago

Honestly, that doesn't look too bad, both [English... 「這是中文」 English...], and [English... "這是中文" English...] looks natural, but I do agree there should be room for users to choose. And I think that also applies for the default behavior too, i.e. special cases for website + user defined rules to alter default behavior, cos the default behavior will never satisfy everybody.

t7yang commented 2 years ago

cos the default behavior will never satisfy everybody.

🤝

tongwentang / tongwentang-extension

Add proactive and inactive detective mode #54