Open Hugo-Heagren opened 2 years ago
I like this idea a lot, though I would personally prefer to work with text entry rather than a complex visual interface. I worry that with large files it would be cumbersome to flip through all the page thumbnails. For instance, if I want to extract a book chapter to distribute to students, I would probably want to enter something more like 5,8, 44-72, 323-330 (for the title page, table of contents, chapter, and footnots). Ideally the outline and bookmarks would be preserved, with an option to include or exclude annotations. Right now I usually do this with a long, complicated pdftk invocation which I have to look up every time. It would be amazing to do it form inside emacs.
I'm not sure I fully understand what you want but it seems like you might find pdf-virtual interesting. It describes itself as
A virtual PDF is a collection of pages, or parts thereof, of arbitrary documents in one particular order. This library acts as an intermediate between pdf-info.el and all other packages, in order to transparently make this collection appear as one single document.
I haven't tried this myself but from a quick glance at the source, it seems to me that the entrypoint is pdf-virtual-buffer-create.
I am not sure if this type of functionality has been implemented in the epdfinfo
server (it could be, but I am not aware of it). I am not sure what kind of functionality pdf-virtual provides, but AFAIK eventually the functionality for deleting/reordering pages must be supported by the server, and I have not yet discovered such functionality (it might be there, but again I am not aware of it).
On the other hand, I have created an alternative server, written in python and using the pymupdf library. It also provides scripting functionality, you can read about it here. It is actually quite nice, as it already extends functionality by supporting arrow annotations and supporting EPUB documents (and as mentioned already it provides a scripting feature; using that I think it is already really easy to delete/reorder pages 'with text').
Although there is not much documentation for the new server (I think it is not really needed), the pymupdf library is really well documented. So I guess if you would like to work on implementing such functionality, the pymupdf server (called vimura server) together with the pymupdf documentation provides a good starting point.
Another good alternative would be to create a simple frontend for these things using one of the command line tools poppler-utils or mutool. I think the transient package (possibly combined with some other magit packages) could provide a really nice framework for this (as it does generally for command-line tools).
Unfortunately, I am not in the privileged position (actually completely the opposite) to do all the required work for free (I am considering to start/setup some donations action for it, although I am not sure if I will ever make time for that), but I would generously offer/provide any help to enthusiastic users/developers who would like to extend functionalities.
To put simply, pdf-virtual is a way to mix-and-match pages from various files into a single buffer. You can have a rearranged view of a file but the rearragement is not done in-file, it is an illusion. To illustrate my point, paste the following string into a buffer and type M-x pdf-virtual-view-mode (you should change the string to a valid pdf filename),
;; %VPDF
(("1.pdf" 1 5 (10 . 20)))
this should present a buffer only with pages 1, 5, and 10–20 from "1.pdf".
You cannot add annotations and such however, it simply signals an error. Better than nothing, I suppose.
In conclusion, as you noted, for actually reordering and deleting pages and such, epdfinfo must gain that ability. pdf-virtual is an okay-ish substitute for now.
Alternatively, one could make a transient or hydra for commands that she'll out to pdftk or similar.
I'd also love to have one that runs Oct on the current pdf. But these seem like user modifications rather than core additions to the library.
On Jan 7, 2022 10:26 AM, viz @.***> wrote:
To put simply, pdf-virtual is a way to mix-and-match pages from various files into a single buffer. You can have a rearranged view of a file but the rearragement is not done in-file, it is an illusion. To illustrate my point, paste the following string into a buffer and type M-x pdf-virtual-view-mode (you should change the string to a valid pdf filename),
;; %VPDF
(("1.pdf" 1 5 (10 . 20)))
this should present a buffer only with pages 1, 5, and 10–20 from "1.pdf".
You cannot add annotations and such however, it simply signals an error. Better than nothing, I suppose.
In conclusion, as you noted, for actually reordering and deleting pages and such, epdfinfo must gain that ability. pdf-virtual is an okay-ish substitute for now.
— Reply to this email directly, view it on GitHubhttps://github.com/vedang/pdf-tools/issues/60#issuecomment-1007494698, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACY6NEO27YWTE2RPI5AAUDUU4AZNANCNFSM5KJTPK7Q. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.Message ID: @.***>
I was just going to try this out a bit and paste the results here, but then the code grew into this: https://github.com/orgtre/qpdf.el
I followed the suggestions by @dalanicolai and @titaniumbones and created a transient interface to an external command-line tool. Poppler has basically no support for editing pdf files, so I think this is the best option. poppler-utils is also quite limited. pdftk and mutool are reasonable alternatives, but I ended up choosing qpdf. It seemed to be easiest to work with (great documentation) and I liked the flexible –pages specification, which allows subsetting and combining pages across multiple pdf files in many different ways.
Please do provide feedback and suggestions, especially on which further options should be included (although for most, the --custom option which allows one to type any other option, combined with a completion framework with completion history, might be enough) and on how the page and file selection could be made more convenient (especially when one wants to use multiple files and page ranges). Also, do you think this should stand on its own or be merged into pdf-tools?
Reposting here. Originally posted here, but I was told this repo is the canonical one at the moment. See previous issue for some brief comments.
I would find it very useful to be able to reorder and remove pages from a pdf. Ideally, it would also be good to merge more than one file into a single pdf, and have all three operations available at once in a single interface. I'm thinking of the sort of functionality the navigation sidebars provide in some viewers: not just navigation between thumbnailed pages, but some editting ability of those pages.
I was thinking that the editting experience of magit's interactive rebase might be a good model for this. We could construct a tabulated-list-mode buffer with thumbnails of each page, and any relevant information about them (maybe their original page number for example), and provide a s set of commands for reordering, removing, (perhaps inserting from other files) etc. Once you've got your list reordered how you want, press C-cC-c and the file is editted to reflect the list.