xxyzz / WordDumb

A calibre plugin that generates Kindle Word Wise and X-Ray files for KFX, AZW3, MOBI and EPUB eBook.
https://xxyzz.github.io/WordDumb/
GNU General Public License v3.0
386 stars 19 forks source link

Book Language Metadata does not change to English if Only generating Word Wise #142

Closed gloverd closed 3 months ago

gloverd commented 1 year ago

Checkboxes

Describe the bug

The documentation says that the metadata language of the book will be changed to English for non-english books automatically -- I don't think that is working when clicking "Create Word Wise" sub-menu item: image

It goes through the steps of "Generating Word Wise" but when you open the book on a kindle, it does not show that word wise is available. Screenshots of : Book Metadata, Job Details, Kindle Screen showing no word wise image image image

If the metadata language is manually set to English, then it does generate (as expected, which is indicating that it is not switching. Screenshots of : Book Metadata, Job Details, Kindle Screen showing word wise now working image image image

I will be opening a separate Issue for the results of the Word Wise which you can see from the last screenshot is only looking up English words, I don't think its related to issue 141, but I have not been able to fix it with the 3.29.6 release from the artifacts.

System Information

OS: win10 Calibre: 6.24.0 python: 3.11 plugin ver: 3.29.6 (Installed from Artifacts)

Error message

No *Error* message appears.

Reproduce steps

  1. Set Book Metadata language to a non-English language.
  2. Click Create Word Wise in the Word Dumb dropdown menu.
  3. Open Book on Kindle.

Screenshots or videos

No response

xxyzz commented 1 year ago

The document is kind confusion and it needs update. The code makes a copy of the book and sets the language of the copied book to English and sends this copied book to Kindle, because Word Wise is only enabled for English books. If you set the book language to English then the plugin will assume the book is in English and only looks for English words.

Could you upload the Word Wise database file created when the book language is French?

xxyzz commented 1 year ago

Since this issue is not related to the solved issue 141, I'll answer your questions here:

gloverd commented 1 year ago

For some reason I can no longer run "Generate Word Wise" on .mobi books. I've tried clean installs of plug-in and removing the associated folders under %APPDATA% , it consistently just keeps running where as in the past it would at least complete. sometimes in seconds, but most often a few minutes (as per screenshots in #141). It will run on epub files. I wonder if I corrupted the book somehow as part of this... This is one of the previously generated files I had in my kindle.

In order to upload, I renamed the .kll to .txt LanguageLayer.en.BBB2IHO521.txt

In this one, for example, I see the french word "Morale" picked up with gloss as "Moral", other pairs are (Talons, griffe), (Savants, savant), (Instant, immédiat, instantané), (unique, unique), ...

xxyzz commented 1 year ago

I think it runs so slow with French books maybe is because the default setting have too much enabled lemmas.

And I fixed a bug for KFX books: https://github.com/xxyzz/WordDumb/commit/ba6582eb0453d0d2e72fa035de02f347c89939db, but you're using MOBI book?

gloverd commented 1 year ago

I've tried KFX, mobi, and epub in the past. I have this running in the background right now; I downloaded a new out-of-copyright book (Les Miserables) as an epub file. I converted it to MOBI, and am running only the "Generate Word Wise" (not the full word dumb button). It has been running for 30 minutes at this point. image

You may be onto something about the size, because I can run it for english books quite fast. As far as trying with fewer lemmas, If I uncheck the enabled button in the customize kindle wordwise pop-up for a whole series of words, will that improve performance, or does the fact that it still has to look up the word before determining if it is enabled or not prevent significant improvements? image

gloverd commented 1 year ago

I disabled the lemmas under difficulty 5 and 4, and it finally produced the expected result. Some of the lemmas in 5 are probably WAY too common in text (it has words like "it", "not/no", "the (plural)", "a"), and level 4 also has some very common words; so I'm sure that it is bogging it down.

It took 5.5 hours to save the updated lemma file. I tried to export it and re-import it, but I don't think that's possible? the exported file doesn't seem to have any information about the level or enabled status; and I'm not sure if I can just rename it to enable its import.

After a computer restart though; it no longer works. I am going through the process of re-saving the lemmas and will re-try.

xxyzz commented 1 year ago

When the "save" button is clocked the code creates a file for spaCy to use later, maybe the enabled words by default are too many so this process is very slow. You can use SQL to disable large rows in a query to db file worddumb-lemmas/fr/wiktionary_fr_fr_v0.db(with SQLite command or https://sqlitebrowser.org):

UPDATE senses SET enabled = 0 WHERE difficulty < 3;

Then click the save button it should runs faster. I should make enabled words much less by default but haven't find better data source to convert to the difficulty value.

The export feature is for creating Anki cards. Your settings for lemmas are saved to the db file.

gloverd commented 1 year ago

That really has helped! Saving new lemmas down to 43m from 330m, and per-book word-wise generation about 70% faster!

xxyzz commented 1 year ago

I test a French book in KFX and AZW3 format and both have working Word Wise now. But for a better quality enabled French words by default, data similar to how English and Chinese default words are chosen are needed: https://github.com/xxyzz/Proficiency

xxyzz commented 8 months ago

https://github.com/xxyzz/WordDumb/commit/97394c94fdaf69f29ab097961ae34f01d6a37e0b should improve the save lemmas job speed, you could download the test version from here: https://github.com/xxyzz/WordDumb/actions/runs/8028950382