pgaskin / dictutil

Tools, documentation, and libraries related to Kobo dictionaries.
https://pgaskin.net/dictutil
MIT License
55 stars 4 forks source link

Can't find words in Kobo dictionary generated by dictgen #16

Open Ceiyne opened 3 years ago

Ceiyne commented 3 years ago

This is my first time using dictgen, so I apologize if this is actually user error.

I have been trying to convert JMdict to a Kobo dictionary. I used Pyglossary to generate a df file from JMdict, and then used dictgen to create the Kobo dictionary. After installing it (using the custom-dict folder), the Kobo saw the dictionary, but none of the words I tried to look up were able to be found. These were all words that are present in the JMdict source file.

When I ran dictgen, I used the default options. It ran without errors and said that it successfully wrote 190,800 entries.

I did a little troubleshooting but didn't come up with anything solid. Here are a few notes from what I checked:

  1. I looked at the df file that Pyglossary generated and it appeared to be in the correct format based on what I see on your documentation page. I also verified the entries I was trying to find in the book, and they were present in this file as well.
  2. I looked at the zip file that dictgen generated and on the surface it looked like my other Kobo dictionaries. It contained many files with filenames with two-character names like xy.html and those files contained unreadable data.
  3. I looked at your existing issues but didn't see anything similar. I saw one issue where your notes mentioned a "no words found" bug triggered by spaces in the dictionary filename, but I did not have spaces. I tried a few different names to make sure it wasn't a naming issue, things like: dicthtml-test.zip and dicthtml-test-test.zip

Pyglossary is capable of creating Kobo dictionaries as well, so in case you were wondering why I didn't just do that... I tried that method but had issues there as well. With Pyglossary the generated dictionary worked to some extent -- the Kobo would return the correct dictionary entries for many words. But there were also a lot of words that could not be found despite being present in JMdict. So, I thought I'd try working with dictgen instead.

pgaskin commented 3 years ago

If this is the same Japanese issue reported on PyGlossary, note that the Kobo dictionary implementation in PyGlossary was derived from dictutil. 🙂

If it's not, can you provide a few examples of words which aren't found?

Also, just to check, what's your firmware version?

Ceiyne commented 3 years ago

Yeah, it's basically the same one. The one over there was from when I used Pyglossary to do the whole conversion, and the one here was when I used Pyglossary to make the df file and dictgen to create the dicthtml.zip.

I'm on the latest (as far as I know) firmware, 4.25.15875.

pgaskin commented 3 years ago

Yep, that would have essentially the same result since PyGlossary's logic is based on dictutil.

14 is quite high up on my to-do list, but I haven't had enough contiguous free time to work on it yet. I'll probably end up doing it towards the middle of this year. Even though that doesn't implement the Japanese algorithms, it should be possible to work around the prefixing differences entirely as a temporary hack using the new prefix exception mechanism.