Closed gloverd closed 3 months ago
The document is kind confusion and it needs update. The code makes a copy of the book and sets the language of the copied book to English and sends this copied book to Kindle, because Word Wise is only enabled for English books. If you set the book language to English then the plugin will assume the book is in English and only looks for English words.
Could you upload the Word Wise database file created when the book language is French?
Since this issue is not related to the solved issue 141, I'll answer your questions here:
wordwise-lemmas
folder in the calibre plugin folder, all downloaded word wise data files are saved there.For some reason I can no longer run "Generate Word Wise" on .mobi books. I've tried clean installs of plug-in and removing the associated folders under %APPDATA%
, it consistently just keeps running where as in the past it would at least complete. sometimes in seconds, but most often a few minutes (as per screenshots in #141). It will run on epub files. I wonder if I corrupted the book somehow as part of this... This is one of the previously generated files I had in my kindle.
In order to upload, I renamed the .kll
to .txt
LanguageLayer.en.BBB2IHO521.txt
In this one, for example, I see the french word "Morale" picked up with gloss as "Moral", other pairs are (Talons, griffe), (Savants, savant), (Instant, immédiat, instantané), (unique, unique), ...
I think it runs so slow with French books maybe is because the default setting have too much enabled lemmas.
And I fixed a bug for KFX books: https://github.com/xxyzz/WordDumb/commit/ba6582eb0453d0d2e72fa035de02f347c89939db, but you're using MOBI book?
I've tried KFX, mobi, and epub in the past. I have this running in the background right now; I downloaded a new out-of-copyright book (Les Miserables) as an epub file. I converted it to MOBI, and am running only the "Generate Word Wise" (not the full word dumb button). It has been running for 30 minutes at this point.
You may be onto something about the size, because I can run it for english books quite fast. As far as trying with fewer lemmas, If I uncheck the enabled
button in the customize kindle wordwise
pop-up for a whole series of words, will that improve performance, or does the fact that it still has to look up the word before determining if it is enabled or not prevent significant improvements?
I disabled the lemmas under difficulty 5 and 4, and it finally produced the expected result. Some of the lemmas in 5 are probably WAY too common in text (it has words like "it", "not/no", "the (plural)", "a"), and level 4 also has some very common words; so I'm sure that it is bogging it down.
It took 5.5 hours to save the updated lemma file. I tried to export it and re-import it, but I don't think that's possible? the exported file doesn't seem to have any information about the level or enabled status; and I'm not sure if I can just rename it to enable its import.
After a computer restart though; it no longer works. I am going through the process of re-saving the lemmas and will re-try.
When the "save" button is clocked the code creates a file for spaCy to use later, maybe the enabled words by default are too many so this process is very slow. You can use SQL to disable large rows in a query to db file worddumb-lemmas/fr/wiktionary_fr_fr_v0.db
(with SQLite command or https://sqlitebrowser.org):
UPDATE senses SET enabled = 0 WHERE difficulty < 3;
Then click the save button it should runs faster. I should make enabled words much less by default but haven't find better data source to convert to the difficulty value.
The export feature is for creating Anki cards. Your settings for lemmas are saved to the db file.
That really has helped! Saving new lemmas down to 43m from 330m, and per-book word-wise generation about 70% faster!
I test a French book in KFX and AZW3 format and both have working Word Wise now. But for a better quality enabled French words by default, data similar to how English and Chinese default words are chosen are needed: https://github.com/xxyzz/Proficiency
https://github.com/xxyzz/WordDumb/commit/97394c94fdaf69f29ab097961ae34f01d6a37e0b should improve the save lemmas job speed, you could download the test version from here: https://github.com/xxyzz/WordDumb/actions/runs/8028950382
Checkboxes
Describe the bug
The documentation says that the metadata language of the book will be changed to English for non-english books automatically -- I don't think that is working when clicking "Create Word Wise" sub-menu item:
It goes through the steps of "Generating Word Wise" but when you open the book on a kindle, it does not show that word wise is available. Screenshots of : Book Metadata, Job Details, Kindle Screen showing no word wise
If the metadata language is manually set to English, then it does generate (as expected, which is indicating that it is not switching. Screenshots of : Book Metadata, Job Details, Kindle Screen showing word wise now working
I will be opening a separate Issue for the results of the Word Wise which you can see from the last screenshot is only looking up English words, I don't think its related to issue 141, but I have not been able to fix it with the
3.29.6
release from the artifacts.System Information
OS: win10 Calibre: 6.24.0 python: 3.11 plugin ver: 3.29.6 (Installed from Artifacts)
Error message
Reproduce steps
non-English
language.Create Word Wise
in the Word Dumb dropdown menu.Screenshots or videos
No response