themoeway / yomitan

Japanese pop-up dictionary browser extension. Successor to Yomichan.
https://chromewebstore.google.com/detail/yomitan/likgccmbimhjbgkjambclfkhldnlhbnn
GNU General Public License v3.0
989 stars 76 forks source link

part-of-speech virtually always "Unknown" when creating Anki vocabulary notes #988

Open mlidbom opened 2 months ago

mlidbom commented 2 months ago

Description with the jitendex dictionary installed, find the word 緩り and add it to Anki. In spite of the word having the part-of-speech "noun" the field in the Anki note will show "Unknown". The same goes for virtually all words I've added. I think, maybe, it's been populated in some rare cases, but I'm not sure.

Browser version Latest Edge

Yomitan version 24.5.14.1

Exported settings file yomitan-settings-2024-05-22-21-31-26.json

stephenmk commented 2 months ago

I wrote about this two years ago.

[T]he part-of-speech field only contains a limited and modified subset of [the part-of-speech tags]. [...] these values are used behind-the-scenes for de-conjugating words into their dictionary forms so that they may be queried by yomichan. Part-of-speech tags that are not used for de-conjugation are not added to this part-of-speech list. [...]

[...] I don't think this {part-of-speech} handlebar should even exist. All of this information already exists in a complete form within the {glossary} field. The part-of-speech of a given word can also vary depending on the sense in which it is used. For example, 亜 can be a prefix or a noun.

The part-of-speech handlebar shouldn't even exist. That information is only used by yomitan for deinflecting words that may be inflected (adjectives and verbs). There's no need to display it to end users.

mlidbom commented 2 months ago

I make heavy use of the part-of-speech tagging when studying. It's important to me. Parsable metadata in a field is very different from the information in principle being present in another field with reams of text.

To me the difference is vital, since the metadata is parsed by the anki addon that I'm developing and is used in many places to show abbreviated versions of the vocabulary information.

So, as is, I simply have to manually type it in for every word I add to Anki. This is a real pain.

stephenmk commented 2 months ago

The part-of-speech field within yomitan dictionaries exists to provide deinflection information to the yomitan parser. It was never intended to provide a parsable metadata field containing all of the part-of-speech information for a particular entry. (It wouldn't make a lot of sense to create such a field because different senses within a particular entry may contain different part-of-speech tags).

The reason why the handlebar produces "Unknown" so often is because the information simply isn't provided by the dictionary files. So this aspect is not a bug with yomitan. What you are requesting is really a new feature to assist the anki addon you're developing.

mlidbom commented 2 months ago

I think the field should be populated with he union of all the part-of-speech tags from the senses. It seems to be the only thing that makes sense for a field that represents the entry as a whole and it is exactly what I need and what I expect most people need. Just to know, without reading through every sense which takes far more time, which word types this can be used as. It makes perfect sense to me to have such a field.

If, for some reason, this is unacceptable, then I agree that it would be better to remove support entirely than to have it populate "Unknown" all the time. But really, if anyone doesn't want this information in a separate field, they don't have to use the field. Some, me included, find it very helpful and would very much like it to be properly populated. I really don't see a downside to fixing it so that it is populated.

Kuuuube commented 2 months ago

Just tested a few words and this handlebar appears to work fine for what it does. I'm getting outputs like Ichidan verb or Godan verb. But you won't ever get noun as an output though. That's just how it works.

I do think this isn't very good UX to have and I agree with stephen that it shouldn't exist.

As for what you're requesting here... The data doesn't exist to give you that output. Unfortunately Yomitan is not magic.

Kuuuube commented 2 months ago

If you just want a list of all the dictionary "tags" that are within each gloss that might be possible. I'm not super familiar with the dictionary format to say for sure if that's reasonable. Unsure if these are custom defined by jitendex or if it's a standard thing we can pull out.

mlidbom commented 2 months ago

If you just want a list of all the dictionary "tags" that are within each gloss that might be possible. I'm not super familiar with the dictionary format to say for sure if that's reasonable. Unsure if these are custom defined by jitendex or if it's a standard thing we can pull out.

That sounds good to me. I'm guessing that the worst that could happen is that some tags I don't care about come along for the ride. I would love it if this was implemented.

But if these "tags" can contain entries that are not POS information, then perhaps renaming the field would be a good idea....

stephenmk commented 2 months ago

For the record, I'm not convinced that feature (the union set of all PoS tags) would be useful and I'm not interested in adding it to jitendex.