pgaskin / dictutil

Tools, documentation, and libraries related to Kobo dictionaries.
https://pgaskin.net/dictutil
MIT License
54 stars 4 forks source link

Dictionary handling changes in 4.24.15672 #14

Open pgaskin opened 3 years ago

pgaskin commented 3 years ago

I will need to test everything again and see which bugs have been fixed and what other changes have been made.

See pgaskin/kobopatch-patches#76 for some preliminary notes.


I will probably do this in two releases: A minor release for the new installation process and list of pre-installed dictionaries later this week, and a major one within the next few weeks for the new v3 format and matching rules. Each release will consist of documentation and tool updates.

pgaskin commented 3 years ago

I've merged #13 into this issue.

pgaskin commented 3 years ago

Notes about .kobo/custom-dict handling:

pgaskin commented 3 years ago

Notes about .kobo/dict handling:

pgaskin commented 3 years ago

Notes about word matching:

pgaskin commented 3 years ago

Notes about dictzip v3 vs v2:

pgaskin commented 3 years ago

Notes about built-in dictionaries:

pgaskin commented 3 years ago

Other notes:

pgaskin commented 3 years ago

I think I'm pretty much finished with finding the changes. I'll take another look once the new dictionaries are published, but I think I've found everything.

I still need to actually test and confirm the behaviour of prefix_exceptions.

The rest of the information in the comments above comes from a combination of reading the disassembly, hooking functions, and doing actual testing.

pgaskin commented 3 years ago

From my post on MobileRead:

prefix_exceptions is somewhat of a misnomer, since it doesn't actually make exceptions for prefixes. Instead, it should be called word_redirects, since it just changes the word being looked up to another if it matches exactly. The target file must still have a variant/word matching the new one, and the original file won't be looked in at all.

This also means that there's already a bug in prefix_exceptions, albeit the inverse of the reason why prefix_exceptions was created. Previously, with v2 dictionaries, variants with a different headword prefix wouldn't be found (I worked around this in dictutil by duplicating the entries). Now, if you have a headword named after a redirected variant, it won't be found. For example, with the previous v2 behaviour, the entry for go/went would need to be duplicated into go.html and we.html, and you could also have another unique definition titled went in we.html. With the new v3 behaviour, you can just define it go/went in go.html and add a redirect entry like "went\tgo" to redirect it. But, this is where the new bug happens. Now, if you had a second entry in we.html named "went" (remember that Kobo dictionaries support multiple entries for a word), it won't be found since the words was redirected to "go". I can work around this bug by duplicating the headwords into the redirected files...which is just the counterpart to my previous workaround.

pgaskin commented 3 years ago

The change made in 4.24.15676 appears to make it support multiple prefix exceptions and loop over them when looking up the definition. I will test it later today.

jackiew1 commented 3 years ago

@pgaskin Just a few additional notes:

pgaskin commented 3 years ago

15676 changes:

pgaskin commented 3 years ago

It appears they've crippled the old v2 dictionaries, at least the English one (they are now empty with a large file named "junk" filled with zeros). Presumably, the licensing expired for them. The file modification times show September 24 (the release date of 15676), but I don't think these were uploaded until October 1 (the release date for the new v3 dictionaries).