ssb22 / CedPane

Chinese-English Dictionary Public-domain Additions for Names Etc (CedPane)
http://ssb22.user.srcf.net/cedpane/
The Unlicense
4 stars 1 forks source link

Lexington #62

Closed chinese-words-separator closed 1 year ago

chinese-words-separator commented 1 year ago

列剋星敦 列克星敦 [Lie4 ke4 xing1 dun1] /Lexington/

Reference: https://youdao.com/w/eng/克星/#keyfrom=dict2.index#:~:text=他被告知再次骑上马回列克星敦

CedPane already have 列克星敦, but the traditional is same as simplified. Should the traditional be 列剋星敦?

ssb22 commented 1 year ago

Possibly! 列剋星敦 has been used on Wikipedia (and lots of sites that copied its data e.g. travel-booking sites) but I'm always nervous about wiki typos so check for use elsewhere: a Taiwan translator used it for Lexington technology park; it's also been used in Taiwan for an aircraft carrier and on Baidu Hong Kong for a computer game character. Tianxun uses 剋 in a hotel name but 克 in the city name (in Lexington North Carolina), which might mean either (1) Tianxun made a typo or (2) there's different rules depending on which Lexington we're talking about, meaning we might need different entries. But that would need more data to confirm.

On balance I'd say let's put it in the same entry for now; we can always change this if something more concrete comes to light later.

ssb22 commented 1 year ago

Of course the thing I'm really worried about is that there are currently 3911 other entries with 克 in them which do not use 剋 in the Traditional, and the question is "should they" (and in many of them 克/剋 is not the only Simplified/Traditional difference in the word, so getting it wrong might have more consequences for a simple matcher). I know 阿賓斯克 doesn't use 剋 in Traditional for example, and neither does 阿蒂克 so I don't think we should make it a general rule that 克 'traditionalises' to 剋 in names (like 里 does not always change to 裡 in names, only in words where it actually means "inside").

ssb22 commented 1 year ago

Incidentally, that site uses 克 in Lexington in Traditional and indeed auto-corrects 剋 to 克 when you search in Traditional. It seems there's no universal agreement about whether 剋 is always the traditional version of 克 or whether it's just a variant.

In the specific case of Lexington I think we can get away with putting a 剋 version in anyway because we have seen some use of that and there are otherwise no differences between traditional and simplified. If there were other differences between traditional and simplified, I'd be a bit more worried (unless our software auto-explores all variants of each character in the match)