mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.79k stars 10.03k forks source link

Add support for "page=" and "search=" URL parameters #1875

Closed bf closed 8 years ago

bf commented 12 years ago

Both the Adobe Reader and Google Chrome PDF plugins accept several URL parameters which can be defined via the URL Fragment Identifier. Adobe has published a short overview on these parameters at http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf .

These optional parameters affect the behavior of the PDF plugin in several ways. The two most commonly used parameters - at least from my point of view - are:

Both of these parameters seem straightforward to implement and would create compatibility with use cases where these parameters are used in e.g. in third-party web applications.

Please note that I have searched the issue list for similiar feature requests and found nothing relevant, this is why I have created this ticket.

yurydelendik commented 12 years ago

Currently #page= is supported. Also, the pdf.js has limited support of the zoom and scroll offset parameters.

toddzebert commented 12 years ago

I have a client looking for either search= in the URL or even in the API.

yurydelendik commented 12 years ago

I have a client looking for either search= in the URL or even in the API

We currently landed find functionality. If you have time, check latest code and add search= functionality support.

toddzebert commented 12 years ago

I'm willing to give it a try, but could someone give me a hint on where to start? I saw the search commit. Looks like web/viewer.js includes the search view code.

Which file contains the parsing of the URL parameters?

Also, the use case needs to specify the nth match.

Thanks!

jviereck commented 12 years ago

Which file contains the parsing of the URL parameters?

It's done in the viewer.js as well. Take a look for the pdfViewSetHash() function.

Also, the use case needs to specify the nth match.

I don't get this bit - could you explain it a little bit more?

Best,

Julian

toddzebert commented 12 years ago

OK on pdfViewSetHash, thanks.

|| Also, the use case needs to specify the nth match.

This means we'd only want to highlight and "jump to" the 3rd or 11th or whatever occurrence of the match.

Ideas include:

  1. search="some phrase",3

  2. search="some phrase"&searchocc=3

I like #1 better.

From the docs that @bf linked to: ● search=wordList Opens the Search UI and performs a search for the specified word list in the document. Matching words are highlighted in the document. The words must be enclosed in quotes and separated by spaces; for example:

search=”word1 word2”

● Individual parameters, together with their values (separated by & or #), can be no greater then 32 characters in length. ● You cannot use the reserved characters =, #, and &. There is no way to escape these special characters.

So it seems like pdf.js find searches for the whole string, while the PDF open parameters seem to suggest it searches on each work. ie. pdf.js searches on "inner loop" while PDF open searches for either "inner" or "loop". If I understand this correctly, thoughts on reconciling the behaviors?

PS Even in the online demo, sometimes command-F (I'm on a Mac) brings up Chrome's search and not pdf.js'.

yurydelendik commented 12 years ago

This means we'd only want to highlight and "jump to" the 3rd or 11th or whatever occurrence of the match.

I'm not finding that in the PDFOpenParameters.pdf . So let's make it about "search=" only.

So it seems like pdf.js find searches for the whole string, while the PDF open parameters seem to suggest it searches on each work. ie. pdf.js searches on "inner loop" while PDF open searches for either "inner" or "loop". If I understand this correctly, thoughts on reconciling the behaviors?

Let's implement "Opens the Search UI and performs a search for the specified word listphrase in the document". If needed, we can address rest of it by changing pdf.js search algorithm later.

toddzebert commented 12 years ago

I'm not finding that in the PDFOpenParameters.pdf . So let's make it about "search=" only.

Correct, it's not in the PDFOpenParameters.pdf BUT that's what my client's use-case needs are. I can't really provide them half a solution. I'm thinking going to go with #searchocc=occurrence-number as it'll probably be ignored by any other PDF Open implementation without breaking #search.

Let's implement "Opens the Search UI and performs a search for the specified word phrase in the document". If needed, we can address rest of it by changing pdf.js search algorithm later.

Cool.

Snuffleupagus commented 11 years ago

Let's implement "Opens the Search UI and performs a search for the specified word listphrase in the document". If >needed, we can address rest of it by changing pdf.js search algorithm later.

@yurydelendik Is this something that you still would like to see get implemented?

yurydelendik commented 11 years ago

@Snuffleupagus that will be great. Only "search=" needs to be implemented. For now, it can be implemented as a phrase search. Later we will see if it can changed to words search.

Snuffleupagus commented 11 years ago

@yurydelendik Ok, I'll get started on this.

TattyFromMelbourne commented 11 years ago

Just an FYI really, the search option in with PDF open parameters only provides for lists of words but it would be really great to be able to search for whole strings as well. So, the example (given in the http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf) is #search=”word1 word2” will search for occurrences of words word1 OR word2 but there is currently no option to search for the phrase "word1 word2".

jviereck commented 11 years ago

Just an FYI really, the search option in with PDF open parameters only provides for lists of words but it would be really great to be able to search for whole strings as well.

Seems like the PR implements this. You can test it here:

http://107.21.233.14:8877/cd4f97cdd4dfcee/web/viewer.html#search="Trace-based"

TattyFromMelbourne commented 11 years ago

Wow, that's fantastic.

I guess my only remaining question is this then:- is pdfjs then going to try and conform to the way that Adobe have specified the PDF Open Parameters, in that:

http://107.21.233.14:8877/cd4f97cdd4dfcee/web/viewer.html#search=%22Trace-based%20compiler%22 would retrieve a list (the search would then be for strings that match "Trace-based" OR "compiler")

and use some other URL Parameter scheme for an exact string/phrase match, say, for instance, something like:-

http://107.21.233.14:8877/cd4f97cdd4dfcee/web/viewer.html#phrase=%22Trace-based%20compiler%22

The reason why I ask this is because if you had a URL like:

http://107.21.233.14:8877/cd4f97cdd4dfcee/web/viewer.html#search=%22Trace-based%20compiler%22

the results would be dependent on which PDF reader the client was using. Surely it would be preferable to aim for a URL parameter scheme that augments the existing one, not "breaks" it. (I use the term "break" loosely here...because it is open to question whether Adobe or Verity, who they bought up, ever really committed to any sort of real openness or process in specifying the Open PDF Parameters.)

TattyFromMelbourne commented 11 years ago

P.S. Sorry, my bad, Verity had nothing to do with it and I stand corrected (after a quick Google search,) I guess Adobe did really make it an open standard. See the (good old) Wikipedia entry http://en.wikipedia.org/wiki/Portable_Document_Format.

My question, however, still stands.

bf commented 11 years ago

@yurydelendik is it possible to merge @Snuffleupagus' PR?

aaa2103 commented 11 years ago

Hello jviereck,

With this url: http://107.21.233.14:8877/cd4f97cdd4dfcee/web/viewer.html#search="Trace-based", it is using double quotation mark at the beginning and there is no ending quotation mark, as it is using: #search="Trace-based and NOT #search="Trace-based" . But I have tried with the beginning and ending quotation mark as well as only the beginning quotation mark, but nothing happened.

SuperSpe commented 11 years ago

I'm trying to use #search in the url but can't get working. No UI Search is display nor the text is highlighted. Tryied #scale="100" and doesn't work either. What am I doing wrong? The #page parameter works well. I'm using FF 20.0.1

Snuffleupagus commented 11 years ago

I'm trying to use #search in the url but can't get working. No UI Search is display nor the text is highlighted.

That's because this functionality hasn't been implemented in pdf.js. The open PR #2485 has a partial implementation of #search=. The reason that this PR hasn't been merge yet, is probably that the way pdf.js implements searching doesn't conform exactly to the Adobe specification. I'll ping one of the developers, to see if #2485 can be merged.

Tryied #scale="100" and doesn't work either.

I can't find #scale= in the specification, but using #zoom= should work once #2970 lands.

mozbugbox commented 11 years ago

I'd like to specify a page number to search on. If the search on the given page failed, stop proceeding to other part of the document. That way, we can use pre-indexed data for quick search without going through the whole 100MB PDF file.

fhalperin commented 11 years ago

Although there seems to be no way of doing it with a URL parameter, you can still do it with JS:

PDFFindBar.open(); //optional if you want to show the search bar PDFFindBar.findField.value = 'your search term'; PDFFindBar.highlightAll.checked= true; PDFFindBar.findNextButton.click();

Cheers!

derekdickerson commented 10 years ago

@fhalperin

would it be possible to show an example of the search box? Just need a small example. Trying my best to search text in a folder of pdf files.

srkunze commented 9 years ago

Will this also work for iframes?

charliec114 commented 9 years ago

I try to this:

PDFFindBar.open(); //optional if you want to show the search bar PDFFindBar.findField.value = 'your search term'; PDFFindBar.highlightAll.checked= true; PDFFindBar.findNextButton.click();

but i cant made execute fine

i try in $(document).ready(function(){

but not work for me, somebody helpme? i use PDF.js 1.1.3

sorry but my english is not the best

charliec114 commented 9 years ago

Its works!!!

i added:

  if ('search' in params) {
        PDFViewerApplication.findBar.open();
        PDFViewerApplication.findBar.findField.value = params.search;
        PDFViewerApplication.findBar.highlightAll.checked= true;
        PDFViewerApplication.findBar.findNextButton.click();
  }

in pdfViewSetHash

thanks for all

AjayParmar commented 9 years ago

Hello Friends, I am new to GitHub so please tell me how to download and use this PDF.js ? I have a project in which i have to use Flux Player which starts IE but when i link pdf file page parameter is not working. My project works fine in normal IE, Firefox and chrome but not in IE with Flux Player So i think this PDF.js will help me to open specific page of pdf file. Kindly guide me. Thank you.

Rob--W commented 9 years ago

@AjayParmar See https://github.com/mozilla/pdf.js/wiki/Setup-pdf.js-in-a-website

AjayParmar commented 9 years ago

@Rob--W Thank you very much for help. I like to ask one more question can i run it from DVD as my project always run from DVD no web server. What extra utility i have to use for using this utility from DVD?

timvandermeij commented 9 years ago

@AjayParmar Please do not post unrelated questions on this issue. Your question does not have much to do with the URL parameters that this issue is about. For such questions, please use the mailing list or IRC.

Rob--W commented 9 years ago

@AjayParmar That's going to be a bit difficult, because web browsers restrict access to local files, even if the web page is a local file. If you want to go down that route, converting the file to base64 (or a typed array) and inlining the data in the viewer is probably the best course of action. This use is not explicitly supported, so you're completely on your own if you want to use PDF.js at file://.

AjayParmar commented 9 years ago

@Rob--W Thank you very much. @timvandermeij Sorry for inconvenience, I will remember it. Thank you.

srkunze commented 9 years ago

It seems like this is not working in iframes, right?

mashwinfugro commented 9 years ago

Hello, I would like to change the default url of viewer.js and pass a differenet url to it.. I have a html file which has differnt pdf links.When i click on an any of the links i want that file to be opend using the viewer.html file.. Is it possible to do that??? Wat are the changes to be made?? Please help

timvandermeij commented 9 years ago

@mashwinfugro Your question has nothing to do with the original point of this issue, namely URL parameters for page and search. Please do not post unrelated questions on issues. Use the mailing list instead.

mashwinfugro commented 9 years ago

@timvandermeij Sorry for the inconvenience..

ekraffmiller commented 9 years ago

@charliec114 , I'm trying to make the same change that worked for you - where did you add your code in viewer.js? I don't see pdfViewSetHash() in the latest version of viewer.js.

paulcpk commented 8 years ago

@ekraffmiller found it under PDFLinkService_setHash

So if anybody is still trying to implement the workaround for this, go ahead and paste if ('search' in params) { PDFViewerApplication.findBar.open(); PDFViewerApplication.findBar.findField.value = params.search; PDFViewerApplication.findBar.highlightAll.checked= true; PDFViewerApplication.findBar.findNextButton.click(); }

into PDFLinkService_setHash in your viewer.js Note that this doesn't play nicely with the other bookmark parameters.

jamesvanhallen commented 8 years ago

Is it possible to find word on Android webView?

Snuffleupagus commented 8 years ago

#page=... has been supported for years, and with PR #5579 adding support for #search=..., I'm closing this issue as fixed.

xaviervalette commented 7 years ago

Hello everyone, I have a plan in .dwg, that I convert in .pdf My purpose is to open this PDF and to programmatically make a search to highlight the ID of an entity in the plan. I work in Csharp and I tried :

myProcess.StartInfo.Arguments = "/A "search=entity_name address_pdf ;

It open the advanced search of Acrobat but it does nothing in. Could you light me?

yurydelendik commented 7 years ago

It open the advanced search of Acrobat but it does nothing in

Support of Acrobat is out of scope of this project. Please contact the vendor.

Timmytbone93 commented 7 years ago

Are you creating the search page yourself? If so you might want to try getting the word occurrence from the backend and adding it as a url parameter. In my instance I use PHP, but any server side language should work. after the search= and occurance= are PHP vairables holding that data.

web/viewer.html?file=data\Newspapers\Daily_Read\2010s\2012\2012-04-29.pdf&search=John&occurance=0

Then I grab it with From viewer.js

if ('search' in params) { searchPDF(params['search'],params['occurance']); } then to actually send the data and make it jump to the occurrence

function searchPDF(td_text,td_occurance) {

    //must modify, finds partial words and not acutal word
    PDFViewerApplication.findBar.findField.value = td_text;
    PDFViewerApplication.findBar.findField.occurance = td_occurance;
    for(var i=0;i<td_occurance;i++){
        PDFViewerApplication.findBar.findNextButton.click();
    }

    PDFViewerApplication.findBar.open();
    PDFViewerApplication.findBar.caseSensitive.checked = false;
    PDFViewerApplication.findBar.highlightAll.checked = true;
//PDFViewerApplication.findBar.close();

}

Only problem is my search page does not find partial matches. for example if you search for John in pdf.js you will get results that say Johnny and Johnathan. I am trying to figure out where the regular expression is that allows this. But for now this is the best I've gotten. Hope this helps

Ranjitakh commented 7 years ago

Here instead of 2 for page number, can i send value from textfield?? Example:<iframe src="abc.pdf#page="TextfieldValue"&toolbar=0">

Timmytbone93 commented 7 years ago

Pretty sure you have to look at the params. You would assign the value of the text box value to the page number variable.

if ('page' in params) { pageNumber = VALUE OF TEXT BOX OR HOW EVER YOU GET THE DATA }

ayushpratap commented 7 years ago

@plck : Hey I was searching for this solution , but I am yet not been able to identify where to put this code in ? As there exists no file named PDFLinkService_setHash , there exists file PDFLinkService , so could you please guide me here a little as where I need to look and dig to embed this auto-searching functionality in my project.