mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.61k stars 9.99k forks source link

Feature request: ToolTip preview of PDF links #5835

Closed alexshtf closed 5 years ago

alexshtf commented 9 years ago

I have been reading a lot of scientific papers lately, and I would like to be able to see a small popup with the linked content when I hover over a PDF link. Usually, when there is a link to an equation of a theorem I don't want to go there, I just want a short reminder and continue reading. I believe that many people who read scientific papers would benefit.

It is something similar to Visual Studio's Peek Definition feature (http://channel9.msdn.com/Series/Visual-Studio-2012-Premium-and-Ultimate-Overview/Peek-Definition)

mbbaig commented 9 years ago

Hi. I'd like to work on this feature for pdf.js.

However, I'm not overly familiar with the code base or API.

Does the API have the functionality to have an iframe load a URL? I guess I'm wondering what is a good starting place for this?

timvandermeij commented 9 years ago

There is nothing in PDF.js so far to help with this feature. I think this would have to be implemented from scratch. Each link is an annotation of type Link, so perhaps you could add an iframe on hover, much like how the Text annotations work. However, I think an iframe is not the best solution, both for security and usability. An image render would be much better, but I'm not sure how easy or difficult that is to implement.

mbbaig commented 9 years ago

Could the following js library be used to accomplish this? rasterizeHTML.js

timvandermeij commented 9 years ago

That looks like a better solution to me, yes.

yurydelendik commented 9 years ago

There are internal links, that point to some destination inside the same PDF, and external. I think the latter shall not have a preview (and act like a hyperlink on a regular web page). Internal link point to some place on a PDF page -- the PDF.js render capabilities can be used to generate the required preview.

mbbaig commented 9 years ago

How does PDF.js differenctiate between internal and external links? Would I modify the link annotations to achieve this from within PDF.js itself? If not than which file would I look at?

mbbaig commented 9 years ago

Okay so inpecting a page loaded with a PDF that has both link types, I saw that it uses the annotLink class to specify a link and the internalLink class to differentiate between the links types.

So I'm thinking that I should have a toggle avalaibel to enable preview on hover and then create a PDFPreviewer like the var PDFViewerApplication in the viewer.js.

Snuffleupagus commented 5 years ago

I'd suggest closing this as WONTFIX, for a number of reasons:

haichaoyu commented 5 years ago

Hi @Snuffleupagus , thanks for your explanation. I am also very interested in this feature but I am only interested in previewing hyperlinks for bibliography. Could you please give some hints if I would like to implement this via Chrome Extension with javascript?

maxsu commented 5 years ago

This would be a valuable feature for an implementation of PDF.js that is specialized for science papers. @mbbaig, @haichaoyu - wanna brainstorm about this?

I think a simple tooltip, like Tooltip.js would not be horrible for performance sake. However, I think that already the features discussed are out of scope of a general pdf reader, and we may want to take the discussion to a more specialized context.

Allow me to propose some cope creep: only a small minority of science papers have internal links for their in-body citations (let alone cross-references). A more broadly applicable solution would parse in text citations and let us "peek" at their associated bibliography items. I'd like to propose that as a parallel goal. Luckily there are tools and services that do this (see Resources, below). For inspiration, I think we can look at Readcube, a proprietary science paper reader that decorates papers with mouse over bibliographic tooltips.

Components it'd need (scope creep labeled with*):

Note: 'links' refer to hyperlinks (with either internal or external target) present in the pdf, while 'references' are in-text citations of bibliography items, or cross-references to footnotes, tables, figures, and so on.

  1. Citation context detector: Detect links and references* in the document
  2. Target data extractor: Pull relevant materials from link targets; text from footnotes, bibliography reference strings, and external pages; do the same for references*
    • Bibliography string parsing: title, year, author, journal, etc
    • Citation matching: Match reference strings to external paper links, get abstracts, etc
    • External link preview: Extract title & snippet from external pages
  3. Preview formatter: Format and style extracted data
  4. Tooltip: Display preview text as a popup
  5. PDF Enhancer: Add tooltips to detected links and references*

Resources

Extraction tools like Crossref's pdfextract(retired, Cermine, Grobid or Zotero's Recognizer Server - would help identify and extract relevant text from bibliographic entities and in text references.

Citation matching tools like EXmatcher would let tooltips present external paper links, citation metrics, and abstracts.

pdf.js-hypothes.is - pdf.js distribution that supports collaborative research annotation. Perhaps a good jumping off point. They need help!

In general, processing PDF references in this way is a little heavy - we may need to build a server runtime to host the context detector, extractor, and formatter. Processing documents with links could be much lighter. A possible approach would be to convert unlinked documents into linked documents via a server, and simply render the link metadata in the viewer.

Other suggestions and input welcome! This issue is almost certainly the wrong place to discuss this - feel free to reach out to me or suggest a new forum :)

echebbi commented 4 years ago

@maxsu @haichaoyu I would also be very interested in such a feature; did you discuss it further?

YijunYuan commented 2 years ago

I think this is a temporary solution.