Open hwhw opened 10 years ago
The two pieces of code: extr.c and pdfattach complement each other and were written for the purpose of working around Amazon's limitation of rejecting non-PDF/mobi files from submission to their cloud storage accessible directly from Kindle device. Namely, one would take a DjVu (or mp3 or whatever) and use pdfattach to attach it (multiple files supported) to a PDF file. Then he would send this PDF to Amazon Cloud and access it from Kindle. Then, using kindlepdfviewer, he would extract those files which are needed and view them. This is the whole purpose of it.
The pdfattach utility is quite simple (almost trivial), but extr.c is, imho, a nice illustration how to access PDF objects directly (using mupdf) and autonomously. IMHO, it should be removed after such "attachment API" is actually implemented in Koreader, not before.
PS. I assume you meant kindlepdfviewer when you wrote kindlevncviewer as this has nothing to do with VNC.
Yes, of course I meant KPV. OK, I agree. We should have the API for accessing attachments - since that is a useful feature in any case, I guess. I'm not sure if we should actively promote the Amazon Cloud storage, errm, feature, but that's a different issue.
Btw, in case someone looks at the actual source code of extr.c
I should mention that the obvious memory leak in the function save_attachments()
is intentional (strdup(3) is called, but no free(3)), because this is an utility designed to be executed and exited, thus destroying its address space on termination. There is no free(3) for strdup(3) because it would slow down the program unnecessarily. But if the function is copied "as is" inside a long-lived program like koreader then the memory leak should be fixed first, obviously. Otherwise on each save attachment operation it would leak a tiny bit of memory.
I' m not sure if extr.c can still be compiled with Mupdf 1.5. It hasn't been compiled since Mupdf 1.4.
It uses the standard pdf_load_page()/pdf_open_stream()/pdf_dict_gets() interface that is unlikely to change in a million years, let alone in a minor revision upgrade from 1.4 to 1.5. Having said that, I haven't checked whether it still compiles or not.
Now, if you really must remove these two utilities, please go ahead and do it. I have created a separate repository here:
https://github.com/tigran123/pdf-attach-extract
So if you need them in the future to refer to when writing attachment display/extraction API in koreader you can always refer to the above repository.
I does not (compile): the API has changed.
It's probably not too hard to update; the API mostly just added an extra pointer or two here and there. (Except for the highlights; iirc that changed quite significantly but that's not relevant here.) I wouldn't necessarily rush to delete it, but it's worth noting that it's inspired by what used to be called mupdfshow
, now pdfshow
https://github.com/ArtifexSoftware/mupdf/blob/6d4ff647eaaa70b35813f31fb5204ea7b668b9e9/source/tools/pdfshow.c
Wow, I expected the API not to change in a million years, but it did in just 12 :) But then again, when I left the research on neural networks in the early 1990s and switched to Linux kernel development I honestly did not expect that 30 years later I would be chatting to a very intelligent LLM, nor that I would write a chat system myself: http://sigmaai.zapto.org :)
I did some archeology: https://github.com/koreader/kindlepdfviewer/issues/487 https://github.com/koreader/kindlepdfviewer/pull/488
So there used to be a functionality to press Alt+S to save all attachments on the current page into the directory.
Besides, removing some code I wrote from an opensource project may affect my free access to GitHub Copilot, which I have discovered only the day before yesterday and now am using all the time when working on my Sigma AI project :)
extr.c is code written by @tigran123 to build a mupdf-based attachment extractor. That was used with kindlevncviewer, but with koreader, we currently do not support such attachments.
My proposal would be to remove the code since now it's a bit distracting. On the other hand, there's no real need to do so other than keeping our code footprint small.
I would offer to introduce an "attachment API" based on the extr.c code into the mupdf interface, so in perspective, we could add extraction (and probably attachment listing) functionality to KoReader, too.