Open pgaskin opened 3 years ago
I've merged #13 into this issue.
Notes about .kobo/custom-dict
handling:
dicthtml*
or it will be ignored.dicthtml-XXX.zip
where XXX
is a valid language code, the dictionary will display as the language name plus (Custom)
. This will still work correctly if there is already a built-in dictionary for that language.dicthtml-XXX-YYY.zip
where XXX
and YYY
are valid language codes, the dictionary will be displayed as the two language names separated by -
plus (Custom)
at the end (i.e. the same rule, but for translation dictionaries).dicthtml*
and does not match any of the two rules above, the display name will be the same as the filename (case is preserved).Extra:
.Extra:
causing no words to be found has been fixed.Extra:
and (Custom)
), it will trigger the bug where it can't find words in the dictionary (the same thing as what used to happen with Extra:
dictionaries)..zip
or be an extracted dictzip otherwise it will trigger the same bug. Extracted dictionaries work fine with all features, but the HTML files will be imported as books. Extracted dictionaries will take priority over a packed one if they have identical names.Notes about .kobo/dict
handling:
(Custom)
part..zip
will be appended to the directory name when attempting to sync it).
Notes about word matching:
Notes about dictzip v3 vs v2:
prefix_exceptions
file, presumably as a better way to handle variants with different prefixes (dictutil currently works around this by duplicating the definition). I think Kobo finally realized that their own dictionaries were affected by that bug...
prefix_exceptions
as a Marisa trie, then split by a tab char and take the second part as the new word to use in place of the original one (the original one won't be checked). (TODO: test this)Notes about built-in dictionaries:
dicthtml-en-ja-pgs.zip
has been discontinued and will be automatically deleted on upgrade. I will still keep this in the list for dictutil for backwards compatibility.Other notes:
dict:///
is still there, but it isn't really a bit deal.prefix_exceptions
handling will cause a different bug for complex dictionaries: If a word is a variant in one file and an actual entry in another, won't the prefix_exceptions
handling for the variant cause it to ignore the actual entry in the original file for the word prefix?I think I'm pretty much finished with finding the changes. I'll take another look once the new dictionaries are published, but I think I've found everything.
I still need to actually test and confirm the behaviour of prefix_exceptions
.
The rest of the information in the comments above comes from a combination of reading the disassembly, hooking functions, and doing actual testing.
From my post on MobileRead:
prefix_exceptions is somewhat of a misnomer, since it doesn't actually make exceptions for prefixes. Instead, it should be called word_redirects, since it just changes the word being looked up to another if it matches exactly. The target file must still have a variant/word matching the new one, and the original file won't be looked in at all.
This also means that there's already a bug in prefix_exceptions, albeit the inverse of the reason why prefix_exceptions was created. Previously, with v2 dictionaries, variants with a different headword prefix wouldn't be found (I worked around this in dictutil by duplicating the entries). Now, if you have a headword named after a redirected variant, it won't be found. For example, with the previous v2 behaviour, the entry for go/went would need to be duplicated into go.html and we.html, and you could also have another unique definition titled went in we.html. With the new v3 behaviour, you can just define it go/went in go.html and add a redirect entry like "went\tgo" to redirect it. But, this is where the new bug happens. Now, if you had a second entry in we.html named "went" (remember that Kobo dictionaries support multiple entries for a word), it won't be found since the words was redirected to "go". I can work around this bug by duplicating the headwords into the redirected files...which is just the counterpart to my previous workaround.
The change made in 4.24.15676 appears to make it support multiple prefix exceptions and loop over them when looking up the definition. I will test it later today.
@pgaskin Just a few additional notes:
(Custom)
suffix is easily patchable in libnickel.so.1.0.0.yaml
. I changed mine to (*)
for brevity but I haven't added an official patch for now.(Custom) English - Hebrew (in Hebrew chars)
- see attached.
dicthtml-NOA.zip
and ended up with something unreadable (to me). Presumably looking up NOA
in their language code table resulted in an accidental 'hit'. I also tried something like dicthtml-NewOxAmer.zip
(I can't remember exactly) and got something equally unreadable. dicthtml-OxAm.zip
works OK.15676 changes:
DictionaryParser::htmlForWord
take a second parameter which will override the prefix generation for the word. This has multiple (beneficial) implications:
prefix_exceptions
, and it's a lot more clear how they are meant to be used.It appears they've crippled the old v2 dictionaries, at least the English one (they are now empty with a large file named "junk" filled with zeros). Presumably, the licensing expired for them. The file modification times show September 24 (the release date of 15676), but I don't think these were uploaded until October 1 (the release date for the new v3 dictionaries).
I will need to test everything again and see which bugs have been fixed and what other changes have been made.
See pgaskin/kobopatch-patches#76 for some preliminary notes.
I will probably do this in two releases: A minor release for the new installation process and list of pre-installed dictionaries later this week, and a major one within the next few weeks for the new v3 format and matching rules. Each release will consist of documentation and tool updates.