noembryo / KoHighlights

KOHighlights is a utility for viewing KOReader's highlights and/or export them to simple text, html, csv or markdown files.
MIT License
128 stars 5 forks source link

book paths mangled #31

Closed T-o-m-H-u closed 4 months ago

T-o-m-H-u commented 4 months ago

I wanted to try KoHighlights, but… things went wrong with the book pathnames.

The books’ paths are cut to an impossible absolute destination. I would expect the path to be relative to some known directory, e.g. the CWD, or an absolute system path, where it was last found (not ideal) kohighlight

The paths to the sidecar files are correct – being absolute system paths perhaps also not ideal. luapaths

It seems, KoHighlights cuts away the Home folder from the doc_path in the lua file, in my case: /mnt/ext1. How it knows about the homedir of koreader, escapes me. I would expect the basename to be relative to the input path given to KoHighlights.

noembryo commented 4 months ago

You have to excuse my really basic knowledge of Linux, but I can't seem to understand the issue here.. Is it the appearance of the paths in the path column or is it something actually not working? Are the paths working? If you double-click a row, does the book opens?

If you like a different formatting of the paths (and you can modify the main.py file) you can change the # 945 from path_item = QTableWidgetItem(meta_path) to path_item = QTableWidgetItem(abspath(meta_path)) or path_item = QTableWidgetItem(os.path.relpath(meta_path)) or path_item = QTableWidgetItem(os.path.realpath(meta_path)) and see what looks better.. I don't use a Linux distribution but if there is something not working, I can check it in a VMWare image (that I keep for compiling). The problem as I said, is that I know next to nothing about this OS..

If I totally missed your point, I'm sorry but you have to ELI5.. 😉

T-o-m-H-u commented 4 months ago

Sorry for being ambiguous. I know next to nothing about python. KoHighlights actually cannot find the files. It’s not a cosmetic issue.

The Book path in the list indicates the problem. It cannot possibly reside at /news/document.pdf. The news-directory is not located at / The correct and absolute path is /home/tom/nfs/koreader/news/document.pdf.

The beginning of the Unix File System Hierarchy is the /. Every pathname that has leading slash starts at that root of filesystem. (That’s why we call it root, not slash) It’s strongly discouraged to litter the root with user files. There’s no equivalent in Win, because storage devices are mounted to drive letters. The closest equivalent would probably be the position before the drive letter, which has no name. a very short introduction to the Unix File System Hierarchy at eecs.WSU.edu

The problem probably results from the sdr file’s:["doc_path"] = "/mnt/ext1/news/document.pdf". the _docpath is extracted and somehow /mnt/ext1 is cut from it.

If the sidecar resides in the same directory as the document, the complete path should be cut until document.pdf, and then be reconstrcuted in conjunction with the directory where the sidecar was found.

I don’t understand how KoHighlights finds the books, that’s why this is all conjecture.

noembryo commented 4 months ago

OK. We have a problem.. 💯 Played with it a little and I think fixed it.

Use the attached version to see if its working and tell me, so I can prepare an updated version.. (P2 is for Python2/PySide and P3 is for Python3/PySide2) KOHighlights P3.zip KOHighlights P2.zip

noembryo commented 4 months ago

The problem with paths starts if the sidecar doesn't resides in the same directory as the document. There are some cases that the metadata are inside a KOReader's History folder or in hashbasedstorage. That's why I was using the ["doc_path"] as the first choice for book's path. Now I use it only if the book is not at the same folder as the sidecar folder..

T-o-m-H-u commented 4 months ago

It works, thank you! The paths are now absolute. I hope it doesn’t make the files im-movable.

On the inkpad, I keep the sidecars in the same dir as the files. If one wants to keep them centrally, doc_path would probably still be a good starting point.

The hash-based-storage seems tricky as pdf checksums, as koreader notes, might change frequently with highlighting.

archiving error

However, archiving throws the following error:

Traceback (most recent call last):
  File "/home/tom/.local/bin/KOHighlights P3/main.py", line 841, in on_archive
    data["stats"]["performance_in_pages"] = {}  # can be cluttered
    ~~~~^^^^^^^^^
KeyError: 'stats'

It seems related to koreader’s Reading statistics plugin https://github.com/koreader/koreader/issues/2986 Is this strictly needed? It’s currently disabled here, with no corresponding metadata in the sdr-lua-file.

metadata not read

Metadata is not extracted. All titles equal metadata.pdf or metadata.epub and all authors are OLD TYPE FILE.

Exiftool and pdfinfo print metadata correctly, though. I realize pdf-metadata is convoluted. Is there a metadata type that is guranteed to be read, so it may be tested against?

noembryo commented 4 months ago

However, archiving throws the following error:

You can check your metadata.xxx.lua to see if there is a ["stats"] key. You might be right about the statistics plugin. I will guard against errors like this, you don't have to enable the plugin.

metadata not read Metadata is not extracted. All titles equal metadata.pdf or metadata.epub and all authors are OLD TYPE FILE.

Well, I don't actually extract metadata from the files.. Just reading the metadata.xxx.lua file. If there is no author/title/.. there, I don't have them either.. .. or are you saying that this info exists in the .lua file and KOHighlights can't read it..?

Try this new version KOHighlights P3.zip

T-o-m-H-u commented 4 months ago

Thank you! This seems to wrap it up.

With the update the error regarding stats was replaced by an identical one regarding title. It seems KoHighlights needs a title to archive highlights.

With the Reading statistics plugin enabled (which writes the metadata to the sidecar, where KoHighlights can pick it up) title and author become available and archiving works.

So at the moment the stats plugin is necessary, unless KoHighlights would construct title and author by itself.

noembryo commented 4 months ago

So at the moment the stats plugin is necessary, unless KoHighlights would construct title and author by itself.

Hmm.. I have to add this at prerequisites then.. BTW, by stats do you mean the System statistics plugin?

T-o-m-H-u commented 4 months ago

Sorry, I only meant the Reading statistics plugins by 'stats'.

The Systems statistics plugin is disabled here. It seems not to be necessary – by I don’t know, what it does.

noembryo commented 4 months ago

OK Thank you very much for your help 👍

noembryo commented 4 months ago

Fixed with v1.7.3.0