Closed Kimeiga closed 8 months ago
One interesting observation is that i believe the example sentences are misassigned by entire groups at a time. In shirabe jisho it is clear that all the multiplication kakerus are with the spend time kakerus and all the sit down kakerus are mixed with the insurance kakerus.
After some research, I'm not sure but I suspect the reason this may have happened is because entries to the jmdict have been removed and others have been added and perhaps this contributed to a bunch of off by 1 errors over time that have shifted these example sentence groups around
Another thought is to your point on #37 jreibun might come out soon and be a better source of sentences than tanaka corpus anyways, albeit not sure when it will be released
Hi Stephen! Amazing work, thank you for contributing to the world's knowledge!
Thanks, I'm always glad to hear that people like the project.
If you look up the dictionary entry for 掛ける in Jitendex, there are many examples of sentences from Tanaka Corpus being assigned to the wrong sense.
Yes, these errors are very common. I have probably fixed a couple hundred of them over the past year.
I have an account with Tatoeba, but I'm afraid I don't know how to edit the sentences
Tatoeba has a very primitive GUI for editing the links to JMdict entries. It is technically open to the public to use, but it is extremely user-unfriendly and difficult to use correctly.
Feel free to let me know when you spot these errors and I'll go fix them. A couple of other users have also been reporting these errors to me in the discussion forum.
After some research, I'm not sure but I suspect the reason this may have happened is because entries to the jmdict have been removed and others have been added and perhaps this contributed to a bunch of off by 1 errors over time that have shifted these example sentence groups around
That is indeed a common reason for the errors. Whenever entries in JMdict are edited, the editors need to remember to update the sentence links as well. We try to keep this in mind, but sometimes we forget. I recently suggested that some of this sentence information should be displayed in the JMdict database editor to make it easier to remember, but this is a volunteer project and things don't always move quickly.
Another thought is to your point on #37 jreibun might come out soon and be a better source of sentences than tanaka corpus anyways, albeit not sure when it will be released
It's been almost a year since the last public update from that project, so I'm not sure how soon that will be. Fingers crossed.
Hi Stephen! Amazing work, thank you for contributing to the world's knowledge!
I have noticed some issues with the Tanaka Corpus, and am not sure where to discuss this, but since I intend to use Yomitan as my popup dictionary of choice for some time, figured I would mention it here. This problem comes up in other projects that use the Tanaka Corpus of course (e.g. Shirabe Jisho for iOS).
If you look up the dictionary entry for 掛ける in Jitendex, there are many examples of sentences from Tanaka Corpus being assigned to the wrong sense.
sense 9 means multiply.
But multiply sentence is included with sense 5.
sense 11 means take a seat, and includes the correct reference
but the example sentence is with sense 22, to apply (insurance)
Is there anything that can be done about this?
I read on the EDRDG wiki that the Tanaka Corpus is now within Tatoeba and it is its new "home". Does this mean each time we see something like this, we should correct it there?
Here's one of those sentences:
https://tatoeba.org/en/sentences/show/236991
I have an account with Tatoeba, but I'm afraid I don't know how to edit the sentences, and even if I did, would I be able to change the attribution information that links it to one of the senses in the jmdict?
Just bringing this to your attention in case it is not possible to change things at the source (the Tanaka Corpus itself) and we might need to make a file in Jitendex for all the manually assigned corrections or something.