xxyzz / WordDumb

A calibre plugin that generates Kindle Word Wise and X-Ray files for KFX, AZW3, MOBI and EPUB eBook.
https://xxyzz.github.io/WordDumb/
GNU General Public License v3.0
354 stars 18 forks source link

Error creating x-ray file - log attached! #237

Open arpanghosh8453 opened 2 weeks ago

arpanghosh8453 commented 2 weeks ago

Checkboxes

Describe the bug and copy the error message

Error message

calibre, version 7.12.0 (win32, embedded-python: True)
Tonnerre de Brest!: An error occurred, please copy error message then report bug at GitHub.

Starting job: Generating X-Ray for The Spare Room 
Job: "Generating X-Ray for The Spare Room" failed with error: 
Traceback (most recent call last):
  File "calibre\gui2\threaded_jobs.py", line 85, in start_work
  File "calibre_plugins.worddumb.parse_job", line 211, in do_job
  File "calibre_plugins.worddumb.utils", line 53, in run_subprocess
  File "subprocess.py", line 571, in run
subprocess.CalledProcessError: Command '['py', 'C:\\Users\\thisi\\AppData\\Roaming\\calibre\\plugins\\WordDumb.zip', '{"book_id": 725, "book_path": "D:\\\\Library\\\\Andrea Bartz\\\\The Spare Room (725)\\\\The Spare Room - Andrea Bartz_x_ray.epub", "mi": null, "book_fmt": "EPUB", "book_lang": "en", "useragent": "WordDumb/3.32.0 (https://github.com/xxyzz/WordDumb)", "plugin_path": "C:\\\\Users\\\\thisi\\\\AppData\\\\Roaming\\\\calibre\\\\plugins\\\\WordDumb.zip", "spacy_model": "en_core_web_md", "create_ww": false, "create_x": true, "asin": "", "acr": "", "revision": "", "kfx_json": null, "mobi_html": null, "mobi_codec": ""}', '{"use_pos": false, "search_people": true, "model_size": "md", "zh_wiki_variant": "cn", "mediawiki_api": "", "add_locator_map": true, "preferred_formats": ["EPUB", "KFX", "AZW3", "AZW", "MOBI"], "use_all_formats": false, "minimal_x_ray_count": 1, "choose_format_manually": false, "wiktionary_gloss_lang": "en", "kindle_gloss_lang": "en", "use_gpu": false, "cuda": "cu121", "use_wiktionary_for_kindle": false, "remove_link_styles": true, "python_path": "", "show_change_kindle_ww_lang_warning": true, "ca_wiktionary_difficulty_limit": 5, "cs_wiktionary_difficulty_limit": 5, "da_wiktionary_difficulty_limit": 5, "de_wiktionary_difficulty_limit": 5, "el_wiktionary_difficulty_limit": 5, "en_wiktionary_difficulty_limit": 5, "es_wiktionary_difficulty_limit": 5, "fi_wiktionary_difficulty_limit": 5, "fr_wiktionary_difficulty_limit": 5, "he_wiktionary_difficulty_limit": 5, "hr_wiktionary_difficulty_limit": 5, "it_wiktionary_difficulty_limit": 5, "ja_wiktionary_difficulty_limit": 5, "ko_wiktionary_difficulty_limit": 5, "lt_wiktionary_difficulty_limit": 5, "mk_wiktionary_difficulty_limit": 5, "nl_wiktionary_difficulty_limit": 5, "no_wiktionary_difficulty_limit": 5, "pl_wiktionary_difficulty_limit": 5, "pt_wiktionary_difficulty_limit": 5, "ro_wiktionary_difficulty_limit": 5, "ru_wiktionary_difficulty_limit": 5, "sl_wiktionary_difficulty_limit": 5, "sv_wiktionary_difficulty_limit": 5, "uk_wiktionary_difficulty_limit": 5, "zh_wiktionary_difficulty_limit": 5, "fandom": ""}']' returned non-zero exit status 1.

Called with args: (ParseJobData(book_id=725, book_path='D:\\Library\\Andrea Bartz\\The Spare Room (725)\\The Spare Room - Andrea Bartz_x_ray.epub', mi=<calibre.ebooks.metadata.book.base.Metadata object at 0x000001CB632BBE50>, book_fmt='EPUB', book_lang='en', useragent='WordDumb/3.32.0 (https://github.com/xxyzz/WordDumb)', plugin_path='C:\\Users\\thisi\\AppData\\Roaming\\calibre\\plugins\\WordDumb.zip', spacy_model='en_core_web_md', create_ww=False, create_x=True, asin='', acr='', revision='', kfx_json=None, mobi_html=None, mobi_codec=''),) {'notifications': <queue.Queue object at 0x000001CB632B3ED0>, 'abort': <threading.Event at 0x1cb632b14d0: unset>, 'log': <calibre.utils.logging.GUILog object at 0x000001CB632B1DD0>} 
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\WordDumb.zip\__main__.py", line 37, in <module>
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\WordDumb.zip\parse_job.py", line 237, in create_files
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\WordDumb.zip\parse_job.py", line 763, in load_spacy
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\spacy\__init__.py", line 13, in <module>
    from . import pipeline  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\spacy\pipeline\__init__.py", line 1, in <module>
    from .attributeruler import AttributeRuler
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\spacy\pipeline\attributeruler.py", line 8, in <module>
    from ..language import Language
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\spacy\language.py", line 43, in <module>
    from .pipe_analysis import analyze_pipes, print_pipe_analysis, validate_attrs
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\spacy\pipe_analysis.py", line 6, in <module>
    from .tokens import Doc, Span, Token
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\spacy\tokens\__init__.py", line 1, in <module>
    from ._serialize import DocBin
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\spacy\tokens\_serialize.py", line 14, in <module>
    from ..vocab import Vocab
  File "spacy\vocab.pyx", line 1, in init spacy.vocab
  File "spacy\tokens\doc.pyx", line 49, in init spacy.tokens.doc
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\spacy\schemas.py", line 195, in <module>
    class TokenPatternString(BaseModel):
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\pydantic\v1\main.py", line 286, in __new__
    cls.__try_update_forward_refs__()
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\pydantic\v1\main.py", line 807, in __try_update_forward_refs__
    update_model_forward_refs(cls, cls.__fields__.values(), cls.__config__.json_encoders, localns, (NameError,))
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\pydantic\v1\typing.py", line 554, in update_model_forward_refs
    update_field_forward_refs(f, globalns=globalns, localns=localns)
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\pydantic\v1\typing.py", line 529, in update_field_forward_refs
    update_field_forward_refs(sub_f, globalns=globalns, localns=localns)
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\pydantic\v1\typing.py", line 520, in update_field_forward_refs
    field.type_ = evaluate_forwardref(field.type_, globalns, localns or None)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thisi\AppData\Roaming\calibre\plugins\worddumb-libs-py3.12\pydantic\v1\typing.py", line 66, in evaluate_forwardref
    return cast(Any, type_)._evaluate(globalns, localns, set())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: ForwardRef._evaluate() missing 1 required keyword-only argument: 'recursive_guard'

Plugin settings and reproduce steps

Creating x-ray file for a epub book

Generated files, screenshots or videos

No response

xxyzz commented 2 weeks ago

Same as #226, what's the pydantic and pydantic-core version in the worddumb-libs-py3.12 folder? It's because the latest Python 3.12 release changed the ForwardRef._evaluate() function but I thought the latest pydantic already fixed this error.

arpanghosh8453 commented 2 weeks ago

I think it's 2.7.1. Is there any way to fix it manually?

xxyzz commented 2 weeks ago

Delete the worddumb-libs-py3.12 folder and the plugin will download the latest packages.

arpanghosh8453 commented 2 weeks ago

I think I got it working, but the resulting file has some issues when opened with Koreader. First of all, it generates a separate file, and copies the file to it's destination. It cannot be combined with other plugin or plugboard for on the fly modifications of the files simultaneously. And Koreader uses hashes for file identification which makes it hard to replace the file and keep the ongoing progress and other syncs. What I was suggesting here is to have it show the x-ray data in a popup like dictionary (how it already does on kindle native software).

image

Koreader has a similar window for dictionary when a word is selected. I was wondering about a seamless integration. if you can store the x-ray data in a seperate sidecar like file and koreader can load the file on side when the book loads and show the details in the popup, that would be a perfect solution. And for e-readers, Koreader is very popular. So I think it will be great! Let me know what you think!

xxyzz commented 2 weeks ago

Kindle uses sqlite file for Word Wise and X-Ray, I wrote some documents of the database tables at here https://xxyzz.github.io/WordDumb/contributing/index.html

I'm not familiar with koreader sidecar file, I assume you want to support Kindle db files on koreader?

arpanghosh8453 commented 2 weeks ago

So essentially you inject those sqlite file when sending the book? There is no modification on the book file itself?

Koreader can query sqlite files very well. Just need to use some sort of linking between the files so they get identified properly. Then Koreader can perform the query to see if there is any X-ray data available for the selected text.

Can I export the sqlite file worddumb produces? How do you fetch the data? is it generated or fetched using some kind of API?

xxyzz commented 2 weeks ago

Kindle book files are not modified except the book doesn't have a valid ASIN metadata, and only metadata are changed.

Run the plugin on any Kindle format book without connecting Kindle, the db file is at the same folder of the book. X-Ray entities are created using spaCy NER model and MediaWiki API: https://github.com/xxyzz/WordDumb/blob/5db487b9ddab07c84c48306aa4fce515e666609d/parse_job.py#L685

Please notice Kindle uses the text offset location of the entire book to find Word Wise and X-Ray data.

arpanghosh8453 commented 2 weeks ago

I see, Thank you. I will look into it and get back to you if I have an idea how to implement it for koreader.