ssb22 / CedPane

Chinese-English Dictionary Public-domain Additions for Names Etc (CedPane)
http://ssb22.user.srcf.net/cedpane/
The Unlicense
4 stars 1 forks source link

克里斯提·鲁布托 Christian Louboutin #20

Closed chinese-words-separator closed 2 years ago

chinese-words-separator commented 2 years ago

克里斯提·魯布托 克里斯提·鲁布托 [Ke4 li3 si1 ti2 · Lu3 bu4 tuo1] /Christian Louboutin (1963-), French fashion designer/

Saw in: https://www.netflix.com/watch/80118063?t=2339#:~:text=这是克里斯提·鲁布托本人

https://zh.m.wikipedia.org/zh-hans/克里斯提·魯布托

https://baike.baidu.com/item/克里斯提·鲁布托/11004107

I think it's good for CedPane to have a complete name of a person. Regarding the middle dot · for western names, I think it's pretty standard, maybe the convention came from Chinese themselves, I feel that convention is consistent in many websites, baidu, wikipedia, Netflix, to name a few; I've yet to see a website that put spaces around the middle dot · character. It's easy to input middle dot from an IME, in macOS when the pinyin IME is active, just press the backtick ` and it will be converted to middle dot ·. The only thing preventing extensions from being able to render the full person name in Netflix(any websites for that matter) is if the person's full name don't have any definition in the dictionary yet

CedPane have an entry for Christian, but it is Christiaan with double a, and it is 克里斯提安 not 克里斯提. CedPane's 克里斯提 is just Cristi, not Christian

克里斯提 克里斯提 [Ke4 li3 si1 ti2] /Cristi/ 克里斯提安 克里斯提安 [Ke4 li3 si1 ti2 an1] /Christiaan/

If CedPane is yet to adopt the middle dot convention for full name, will just contribute this:

魯布托 鲁布托 [Lu3 bu4 tuo1] /Louboutin/Christian Louboutin (1963-), French fashion designer/

ssb22 commented 2 years ago

Thanks, the reason why CedPane doesn't “do” middle dot is basically because Wenlin doesn't do it. I use Wenlin to manage the database, and Wenlin can basically only handle entries made up of hanzi. I could probably work around this with my Wenlin developer access but it could end up being a major change and would have to be tested really carefully. (And moving out of Wenlin would be a really major change to my workflow.) I'm not sure that Pleco would handle entries with mid-dot in them either. And I guessed the incremental advantage of having mid-dot entries is not really worth all that trouble, if we can make sure to have entries for both the first and last names in each case.

(I could put an entry that's made up of the names run together without a mid-dot in between, but that would be much less useful because the number of texts that do it without mid-dot, while non zero, is quite small and I'm not sure we should be encouraging doing it without the dot.)

CedPane does already have 克里斯蒂安 as “Kristian; Christian (modern first name)” (the part in parenthesis is so as not to confuse it with 基督徒). CedPane is unfortunately inconsistent about how it handles more than one English name mapping to the same Chinese name: sometimes it has separate entries with different definitions, but other times it just puts multiple English names in the same definition (in which case the one listed first may or may not be the one you look up, so you might have to do a full-text search to find one of the others). One of these days I should do a big edit to make it more consistent in the way it handles this, or perhaps just sort things out gradually over time, but I'm not entirely sure what would be a "good" rule: yes if there are 2 English names mapping to the same Chinese name then 2 entries might be more convenient than one, but what if there's 10? So for now I'm just being inconsistent (whether you get multiple entries or one depends how I felt when I edited it, sorry). Might be better one day.

Of course, the English pronunciation of “Christian” is ˈkrɪstʃ(ə)n (or ˈkrɪstɪən) but its French pronunciation is usually more like kʁis.tjɑ̃ which has less of a final "n" sound, so a French person called Christian will be less likely translated using a final 安 to stand for the "-an". So I think the 克里斯提 Cristi entry should be changed so it also says "Christian (French name)" or similar.

And yes Louboutin should be added (lots of Baidu results)

chinese-words-separator commented 2 years ago

I'm not sure that Pleco would handle entries with mid-dot in them either

Pleco do have entries of proper names with the dot convention, e.g., 比尔·盖茨

Albeit, their search engine is not yet working when pasting or typing names with dot in them. As of now, when you search names with dot in them, Pleco will split the result in two, i.e., 比尔 and 盖茨, then you can find 比尔·盖茨 from either 比尔 or 盖茨

FWIW, though CC-CEDICT have proper names with dot convention, this is their stance now on new additions (e.g., Christian Louboutin was rejected)

TL;DR They would accept manual addition of proper names if they are popular enough. Everything else should be automated

Editor: We are no longer adding proper nouns that are in Wikipedia[1] unless they are the sort of name likely to be in any C-E dictionary[2]. A comprehensive C-E dictionary of topics (including proper names) can be derived automatically from Wikipedia, and is likely to appear sooner or later. Pleco has indicated that they will offer this sort of name list after Pleco version 4 is released

1] people, places, book titles etc. [2] 诸葛亮, 黄河, 易经 etc.

"look up by English" function but not a "look up by English starting half way through"

Regarding comma for ChinaScribe's lookup mechanism

[Lu3 bu4 tuo1] /Louboutin/Christian Louboutin (1963-), French fashion designer/

Maybe CedPane can be made to use a delimiter work-around for additional gloss like above, so when a parser that skips ChinaScribe's lookup mechanism (comma), the second gloss above will not be ignored

ssb22 commented 2 years ago

"can be derived automatically from Wikipedia"? Oh, I've tried that, it's not very good. The main problems are:

  1. you can't usually get pinyin. Automatically generating the pinyin works in some cases but not others, it really needs some manual editing, for example to make sure that when there are two or more options we choose the one that sounds nearest to the original name. That's why my automatically extracted list above is English to Chinese characters with no pinyin. I sometimes use it as a reference when manually editing CedPane, but the raw automatically-extracted data is not so good. (There was a request that Wikipedia start a Pinyin edition but it was rejected.)
  2. Sometimes the English to Chinese correspondences in Wikipedia articles are not what you think. A very specific thing in English might cross-link to a more general category of things in Chinese, because this is the closest article containing the information you want, but that doesn't mean the Chinese word for the general category of things can be defined using the very specific thing in English. In some cases an article even cross-links to its opposite meaning in Chinese. Again, manual editing is required. It's OK to use the automatically-generated list as a reference when we're manually editing, as long as we know what we're getting!

Yes Pleco version 4 will be nice when it comes out. I'm hoping they'll add CedPane (currently CedPane can be 'imported' into Pleco, but the 'importing' feature requires the paid version of Pleco; if they added it as one of the default options then it will hopefully be available in the free version of Pleco which will be nice). I don't know if they've also said they'll add an automatic list from Wikipedia.