mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.52k stars 9.98k forks source link

Pdf.js not working with my setup. #6355

Closed thjude closed 9 years ago

thjude commented 9 years ago

Hello,

I got some trouble making PDF.JS work in my current project. I am trying to make a web app with Node and the Express module. I have a page with a form which can take multiple files. These files are stored on the server (working) then parsed to sort them according to their content (not working). To do so, I use an AJAX post which send the request to the Express router and store the file. The parsing is handled in another file required in this router. I'm using the example for Node right now to be sure the code I wrote for PDFJS wasn't wrong. The data of the pdfs seems to be ok (i get it with Uint8Array(fs.readFileSync)). It is just a bunch of key-value. But when I give it to getDocument, nothing is returned, no error and my ajax request fails (error 500). I tried to see where my error comes from and I noticed the WorkerTransport was never initialized and the code stops here. No console log at all are displayed. I pasted my code below. Any ideas why it stops ? Thanks you in advance, Thibaut

code

Obviously getDocument take data instead of pdfpath, and the variable pdfpath is just a placeholder for the example.

Rob--W commented 9 years ago

500 is an error from express, indicating an internal server error. This error code is useless for debugging PDF.js. Could you paste the actual error message? By default, express will print the error messages to standard output of the console from where the Node script is started.

thjude commented 9 years ago

The problem is that I got no error message except error 500. I also tried with different verbosity level of PDF.js. If I console.log something just before the getDocument in my code (or even in the getDocument function in pdf.js code), it get printed on my console but no error message from PDF.js core code.

Parameters seem to be defined correctly in PDF.js. But after the second line below, nothing is printed anymore. My source.range is undefined, is it a big deal ? 1 workerInitializedCapability = createPromiseCapability(); 2 transport = new WorkerTransport(workerInitializedCapability, source.range); 3 workerInitializedCapability.promise.then(function transportInitialized() { 4 transport.fetchDocument(task, params);

Edit: with some more debug I got the main issue which is "[ReferenceError: document is not defined]". I guess that is the document variable of Javascript as I try to use PDF.js on server-side and that cause internal server error. Is there a way to avoid it as I only need text and data processing? I will try pdf2json as the code is made to be server-side.

yurydelendik commented 9 years ago

The problem is that I got no error message except error 500.

That's is main reason why PDF.js cannot display the document.

Can any simple HTML page fetch PDF data from your server? Use XHR with responseType property == 'arraybuffer' and check value of the response property.

thjude commented 9 years ago

A simple HTML page worked well. The fact is that I wanted to fetch and parse data server-side. Moreover I didn't really understand the client-side of NodeJs. I tried several module like phantomjs to get a browser-like behaviour but without much success. As I didn't have much time to dig into it, I changed my mind. Eventually, I came back to PhP with an apache server and called pdf.js on client-side. This is working great. Thanks for your answers and this library. This issue can be closed, it mostly came from my inexperience with NodeJs. However, documentation about server-side parsing/fetching would be a real plus for huge data processing applications.

yurydelendik commented 9 years ago

Based on comment above, closing as duplicate of #6351