stephenmk / Jitendex

A free, offline, and openly licensed Japanese-to-English dictionary. Updates weekly!
https://jitendex.org
Creative Commons Attribution Share Alike 4.0 International
248 stars 2 forks source link

とき definition prioritisation within Yomitan? #98

Closed Klambyyy closed 2 months ago

Klambyyy commented 2 months ago

When scanning the kana とき, it feels a little odd that I'm getting these two definitions above 時 - is this prioritisation controlled by Jitendex? Apologies if I have posted this in the wrong place.

image

stephenmk commented 2 months ago

Apologies if I have posted this in the wrong place.

All good

is this prioritisation controlled by Jitendex?

Yes and no. You'll notice that the entry for 時(とき) has a "priority form" tag and a star (★) tag. These tags are applied to the most common and frequently used entries. There is also a higher "score" (a numeric value) in the dictionary data that is assigned to the entry to indicate that it should be shown higher in the search results. So in principle, Yomitan could use this score information to decide that the 時(とき) entry should be shown first. However, it didn't do that because the entry for 時(とき) does not contain "とき" by itself as an independent form; the entry merely contains とき as a reading for 時. Yomitan is currently configured to show the other two entries in which とき is an independent form because you searched for "とき."

So there are a couple options that could "fix" this issue. I could add "とき" as an independent term for the 時(とき) entry. It's not really clear how I would go about deciding which entries should have the readings as independent forms and which should not. I don't have any data that I could use to determine that the 時(とき) entry should have an independent term for its reading. Most of the time the word 時 is written with kanji.

The other option is for the Yomitan developers to change the sorting algorithm for search results so that 時(とき) would appear first since it has a higher "score" value even though it's not an "exact" match with "とき."

To be honest, I'm not sure this is a problem that urgently needs to be solved. Sometimes dictionary search results will contain extra info that you don't need, and you just have to spend a little time figuring out which results make sense based on the context. Just an unfortunate fact of life.

Klambyyy commented 2 months ago

Apologies if I have posted this in the wrong place.

All good

is this prioritisation controlled by Jitendex?

Yes and no. You'll notice that the entry for 時(とき) has a "priority form" tag and a star (★) tag. These tags are applied to the most common and frequently used entries. There is also a higher "score" (a numeric value) in the dictionary data that is assigned to the entry to indicate that it should be shown higher in the search results. So in principle, Yomitan could use this score information to decide that the 時(とき) entry should be shown first. However, it didn't do that because the entry for 時(とき) does not contain "とき" by itself as an independent form; the entry merely contains とき as a reading for 時. Yomitan is currently configured to show the other two entries in which とき is an independent form because you searched for "とき."

So there are a couple options that could "fix" this issue. I could add "とき" as an independent term for the 時(とき) entry. It's not really clear how I would go about deciding which entries should have the readings as independent forms and which should not. I don't have any data that I could use to determine that the 時(とき) entry should have an independent term for its reading. Most of the time the word 時 is written with kanji.

The other option is for the Yomitan developers to change the sorting algorithm for search results so that 時(とき) would appear first since it has a higher "score" value even though it's not an "exact" match with "とき."

To be honest, I'm not sure this is a problem that urgently needs to be solved. Sometimes dictionary search results will contain extra info that you don't need, and you just have to spend a little time figuring out which results make sense based on the context. Just an unfortunate fact of life.

That all seems completely fair to me. Thank you for taking the time to respond at length, and thank you for Jitendex as a whole!