ssb22 / CedPane

Chinese-English Dictionary Public-domain Additions for Names Etc (CedPane)
http://ssb22.user.srcf.net/cedpane/
The Unlicense
4 stars 1 forks source link

Split override 萬代, 万代 #24

Closed chinese-words-separator closed 2 years ago

chinese-words-separator commented 2 years ago

https://www.youtube.com/watch?v=ljz_O1Y2Zag&t=2321s#:~:text=我族昌盛+延绵万代

image

ssb22 commented 2 years ago

That's not so good of CEDICT. ABC has: 万代[萬-] wàndài n. all ages; eternity. Pleco has: noun literary, all ages, generation after generation. I'm not so sure we'd want to word-override these, so this would be a "CEDICT specific" word override, which could be making things a bit complicated.

I think a better way to handle this is to add an entry for 延绵万代, as there's quite a lot of search results for this so it's obviously a common formation like 千秋万代 (as it's literary, I guess it would be rare to get 万代 by itself and not part of a 4-character idiom which is what literary Chinese usually goes for, so as long as we've got all the 4-character idioms we shouldn't need to worry so much about 万代 by itself)

chinese-words-separator commented 2 years ago

as it's literary, I guess it would be rare to get 万代 by itself and not part of a 4-character idiom which is what literary Chinese usually goes for, so as long as we've got all the 4-character idioms we shouldn't need to worry so much about 万代 by itself)

Some language learners do want to dig deeper on the component words of a phrase or idiom. CWS has a feature that when the learner right-click a phrase or idiom..

image

..their base words will be shown to them:

image

And some even do want to dig deeper on the component base characters/radicals of a word/character, CWS facilitates this exploration via right-click too, e.g.,

image image image

I digress

I feel that the phrases and idioms should be rootable to their component base words and characters, rooting just Bandai from 延绵_万代 is not a good proposition. Adding 万代 wàndài all ages; eternity will not override the Bandai, at least on CWS's processor, wàndài all ages; eternity will just be added, and if the existing definition is proper noun, it will just be pushed down, so the common uses will be the first ones seen by the learners, e.g.,

万代 wàndài all ages; eternity. Wàndài Bandai toy company

ssb22 commented 2 years ago

Yes, but it would be copying the defitition of a proprietary dictionary and I'm not sure we should do that. As I have the ABC "on tap" I've been trying very hard to avoid adding any words that are also in the ABC.

chinese-words-separator commented 2 years ago

Yes, but it would be copying the defitition of a proprietary dictionary and I'm not sure we should do that. As I have the ABC "on tap" I've been trying very hard to avoid adding any words that are also in the ABC.

I don't mean we copy ABC's. Judging based on the ten thousand and generation, and the movie's context, it's intuitive to come up with what 万代 translates to English to. Reminding me of Oracle's infamous range check, given the triviality of allegedly infringing code, there's no plethora of way to create range check, thus everyone's code will almost arrive at exact same definition/implementation

ssb22 commented 2 years ago

Yes, well rangeCheck was found to be infringing by a US court regardless of whether you or I think it should be that way or not. Disclosure: I currently hold a part-time position in Oracle further to their acquisition of a university-spinoff startup I was doing some work for. I'm a software engineer and not in their legal department, but I'm not going to say my employer's legal positions are wrong. Opinions expressed here are my own and not those of my employer.

chinese-words-separator commented 2 years ago

Interesting, I thought Google won that too in 2021. Upon further reading now, I now learned Oracle won $150,000 on that range check

Google does a cleanroom engineering of Java, however it looks like Joshua Bloch inadvertently implemented the same code, or perhaps Ctrl+C, Ctrl+V the same code for his old and new employer. But I think even he did not copy Ctrl+C, Ctrl+V the code, he'll inadvertently implemented it the same way. The only other way he could not implemented it in the exact same way as Oracle/Sun is if he used the ternary operator for implementation on Google, but throwing exceptions in the ternary operator is possible only on C#. If Java evolves fast enough, Joshua could have even used string interpolation instead of concatenation on range check's error message

It would be nice if programming languages provides more syntactic idioms, so programmers can creatively create unique code, otherwise software engineers are doomed to implement the same idea in the same way

Anyway, I also feel it all boils down to that clean room engineering should prevent Ctrl+C and Ctrl+V from working; so regardless of if the programming languages supports many idioms for the same thing, the implementors may somehow implemented it differently. And if the implementors somehow arrived at the exact same implementation, it just really mean that something is trivial, the court can then decide accordingly if and how much damage is incurred. Give range check as an assignment to a class of 20 students, the teacher will think his/her students have copied each other's code

chinese-words-separator commented 2 years ago

I added the latest CedPane, it has 延绵万代 now, I think 万代 merit its own definition since it carries one concept: through the ages

Otherwise it will be a little odd that when language learners are digging things, they will notice that though 延绵万代's 延绵 lasting maps to extend continuously, yet its 万代 through the ages maps to Bandai

image

Hmm.. perhaps I should also apply the split override on dictionary itself, currently it's just applied to page's sentences if the splittable word is not a part of a larger phrase/idiom

I don't necessarily believe that every joined characters should be defined, e.g., 社会保险金's 保险金

image

But for 万代 ,I think it do need definition. It's just unfortunate that I can't find other phrases that are paired with 万代, since it's mostly Bandai that turns up on 万代's search result

ssb22 commented 2 years ago

Ah, here's a solution.

Lamentations 5:19. King James Bible, 1611 "thy throne [remainest] from generation to generation". Chinese Union Bible 1919 你的寶座存到萬代.

That's an out-of-copyright definition of 萬代 as "from generation to generation". If we use that wording (not the "all ages" wording), nobody can argue it's a "micro-infringement" of copyrighted dictionaries.