ssb22 / CedPane

Chinese-English Dictionary Public-domain Additions for Names Etc (CedPane)
http://ssb22.user.srcf.net/cedpane/
The Unlicense
4 stars 1 forks source link

Web-based dictionary editor for CedPane that can be used to edit, add, delete definitions #65

Closed chinese-words-separator closed 11 months ago

chinese-words-separator commented 11 months ago

So that volunteers can contribute words and edit/delete wrong words

CedPane have this definition:

報錯 报错 [bao4 cuo4] /error (message)/

CC-CEDICT have this definition today:

報錯 报错 [bao4 cuo4] /to report an error/to give wrong information/

总统先生报错了电话号码 Mr. President gave the wrong phone number

报错 by itself is a verb

When paired with 信息, it is not a verb anymore. Or perhaps in strict sense, the 报错 in 报错信息 still connotes a verb in Chinese language; but in natural English, 报错信息 as a whole is just an Error Message, the verb 'report error' does not need to be stated in English

如何编写报错信息,提升用户使用体验 How to Write Error Messages to Enhance the User Experience Reference: https://www.freecodecamp.org/chinese/news/how-to-write-helpful-error-messages-to-improve-your-apps-ux/

With web-based editor, the definition can be provided with context and examples

ssb22 commented 11 months ago

I currently use the Wenlin desktop app for dictionary editing, and CedPane is automatically extracted from my personal Wenlin database (taking just the entries that I've marked as confirmed public domain). That means, if anybody else edits CedPane anywhere else, we also need a way to sync those changes back into my copy of Wenlin, preferably soon and without merge conflicts, or else I'd need to completely change my workflow. Given how much extra work it could take to set this up, it's probably worth it only if there'll be a lot of volunteer edits. If it's just a few, I can handle them manually and check them at the same time.

CEDICT has a more restrictive license than CedPane: if you combine CEDICT with another dictionary, you're forced to open-source the result, which means if your app has permission to use proprietary data but no permission to open-source it, then you can't use CEDICT in the same app unless you have an architecture that keeps dictionaries entirely separate (like Pleco does). CedPane on the other hand can be used in any app. But that does mean we have no right to take CEDICT contributions and put them into CedPane; we have to independently verify everything ourselves, and probably use our own definition wording too (I try not to look at any other dictionary before writing a definition, so if it ends up being the same then that's coincidence not copying). One entry doesn't sound like much, but if we get a reputation for being in the habit of copying other dictionaries' entries then that could mean trouble. One of the most awkward parts of crowd-sourcing is making sure all volunteers really had the right to supply the data they're supplying (I've seen quite a lot of proprietary data in Glosbe that I'm not sure should be there for example; I expect they'll start getting takedown notices for that eventually). It would be really rather nice if CedPane stays as confirmed public-domain so nobody tries to say "you have to take the whole project down because I found an example of infringement in it".

In the case of 报错, one of the examples actually given by that CEDICT contributor (from Zhihu) involves the phrase “各种bug和报错” and I'd say for this particular phrase 报错 is a noun and the existing CedPane definition “error (message)” is quite OK. So CedPane is not wrong, but it may be incomplete. 报错 can be translated "error (message)" in at least some contexts. What that CEDICT contributor is adding is that it can also be translated "to report an error" or "report wrong information" in other contexts.

As the "a" in CedPane stands for "additions", I'm really expecting CedPane to add to other dictionaries, not to replace them entirely (yes there's a PD-English-Definitions.txt file but that's meant as a last resort). So I'm not overly worried if a CedPane entry has one definition for a word that is correct in some contexts, and there also exist other definitions for the same word that are correct in other contexts and can be found in other dictionaries: in this case, combining CedPane with another dictionary should get you both, and that's the intended use-case. For example 波特 we have as "Potter" but ABC has as "baud": normally I omit any entry that's also in the ABC because I don't want it to look like I'm copying the ABC, but if we're looking at a completely different definition then it's probably OK to add the separate entry and don't have to note that it can also mean the other thing, because I'm expecting another dictionary to be around to say that part.

Having said that, it's probably OK to add more definitions to 报错 on the basis of sources, but now I've got to wait until I've forgotten what wording that CEDICT contributor used so I can write my own wording because I don't want to be copying CEDICT.

I do have a field in my private Wenlin database for "general notes" where I sometimes put example sentences I've seen, but I've not included this in CedPane as it'll need a lot of editing before publishing (some of my original sources are not public at all but I've confirmed it by searching public sources after; I don't want to risk private messages from Chinese friends ending up in this thing without their consent).

If someone has a lot to say about a word then it probably makes sense to use a platform like Wiktionary which already has the infrastructure to take it, unless there's a licensing issue.

chinese-words-separator commented 11 months ago

In the case of 报错, one of the examples actually given by that CEDICT contributor (from Zhihu) involves the phrase “各种bug和报错” and I'd say for this particular phrase 报错 is a noun and the existing CedPane definition “error (message)” is quite OK

They overlooked that. It seems the way how some languages work is that brevity is the key. On the context of the phrase above, it is implied that 报错 even without being paired with another word, is that it takes the meaning of a noun

In Filipino language, we say "Magkano (how much) hanggang (up to) McDonalds?" when asking the bus conductor, in English "How much is it to McDonalds?". But most often we would just say "Magkano (how much) McDonalds?", which translates to "How much is McDonalds?", we are not buying McDonalds, but is implied we are asking how much is the bus fare up to McDonalds

I don't want to risk private messages from Chinese friends ending up in this thing without their consent.

Understandable

If someone has a lot to say about a word then it probably makes sense to use a platform like Wiktionary which already has the infrastructure to take it, unless there's a licensing issue.

Maybe CedPane could use something like that, so there are notes and context that can be attached to a word definition