mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.35k stars 9.97k forks source link

Content-Length header exposed in CORS configuration for remote docs breaks PDF viewer #4530

Closed chrisblizm closed 10 years ago

chrisblizm commented 10 years ago

Loading a remote pdf document with the viewer will not succeed if the content-length header is exposed to the browser by the remote server's CORS configuration (if the document is larger than twice the chunk size).

A local apache installation was used to demonstrate this issue with the pdfjs project viewer: http://mozilla.github.io/pdf.js/web/viewer.html?file=//localhost/pdfjs/PDF32000_2008.PDF

The document used in testing is located here: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf

If the apache CORS configuration does not provide the 'expose the Content-Length header' instruction (Access-Control-Expose-Headers "content-length, accept-ranges") then the behavior of the viewer is to download the entire document before drawing any pages on the display.

If the configuration is changed to expose those headers (content-length and accept-ranges) then the behavior of the viewer is to abort the initial GET request in favor of creating a new GET specifying ranges, but only an OPTIONS preflight request is sent and response received and no further processing of the remote resource happens with the end result being no document is displayed at all.

This issue affects PDFJS.version = '0.8.1314' (generic build).

chrisblizm commented 10 years ago

Good news, this isn't a spooky problem in network.js, it's all a matter of CORS configuration.

For the javascript viewer to display a remotely-hosted pdf document, the following configuration values are necessary:

Access-Control-Allow-Origin "whatever.hostname.you.have.or.*.for.all.of.them" Access-Control-Allow-Headers "range" Access-Control-Expose-Headers "content-range, content-length, accept-ranges" Access-Control-Allow-Methods "GET"

The Allow-Headers setting is necessary to let the xhr specify what range it wants - without this, the flow for the xhr's secondary request (after discovering that range requests are supported and the original GET should be canceled in favor of the partial content loading) stalls out at the OPTIONS preflight request. Once this setting is in place, the CORS workflow gets underway.

The Expose-Headers setting is necessary for the browser to include enough of the restricted headers into the response for the xhr to get the data to determine that a) range requests should be the way forward because the content is big enough to warrant it, and b) what byte range was received in the subsequent partial content requests.

Adding these settings into the apache configuration mentioned above allows the test url with the mozilla viewer.js above to work.

yurydelendik commented 10 years ago

Really cool. Thanks for looking into this.

blumonkey commented 9 years ago

@chrisblizm Sorry, for being late but in my case the pdf embedded directly via an <iframe> is loading without any issues whereas if I load it through viewer.html?file=/somefile.pdf it says

 PDF.js v? (build: ?)
 Message: Unexpected server response (0) while retrieving PDF

So if its an issue of CORS, why is this different in both cases? Both the files are hosted on the same server (localhost). Also I tried it on Firefox which uses PDF.js by default. Even there the issue is same. Am I missing something or doing it wrong?

simoncpu commented 8 years ago

For those who have stumbled on this issue via Google, a working CORS configuration for S3 as per @chrisblizm's reply is:

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <CORSRule>
        <AllowedOrigin><!-- insert your origin here --></AllowedOrigin>
        <AllowedMethod>GET</AllowedMethod>
        <AllowedHeader>Range</AllowedHeader>
        <MaxAgeSeconds>3000</MaxAgeSeconds>
        <ExposeHeader>Accept-Ranges</ExposeHeader>
        <ExposeHeader>Content-Range</ExposeHeader>
        <ExposeHeader>Content-Encoding</ExposeHeader>
        <ExposeHeader>Content-Length</ExposeHeader>
        <AllowedHeader>Authorization</AllowedHeader>
    </CORSRule>
</CORSConfiguration>

The key is Access-Control-Allow-Headers "range".

sureshveeragani commented 7 years ago

Thanks very much this solution worked for me.

joalcava commented 7 years ago

I'm having this issue (The pdf is not shown until the entire document is downloaded and the browser shows this message: "Refused to get unsafe header Accept-Ranges") but im using Azure Blob Storage. Does anyone know how the CORRS configurations must be? I have googled it and tried many things, but i can not make it work correctly.

uzay commented 7 years ago

Did anyone deal with this issue using box.com?

loretoparisi commented 5 years ago

@simoncpu thank you for your findings! I have a very similar issue, but for audio contents. In my case these headers will not work by the way

{ 'Content-Length': 5751405,
  'Content-Type': 'audio/mpeg',
  'Access-Control-Allow-Origin': '*',
  'Access-Control-Allow-Methods': 'POST, GET, OPTIONS',
  'Access-Control-Allow-Headers': 'Range',
  Expires: 0,
  Pragma: 'no-cache',
  'Cache-Control': 'no-cache, no-store, must-revalidate',
  'Accept-Ranges': 'bytes',
  'Content-Range': 'bytes 120429-240237/5751405' }
lemon-ai commented 5 years ago

For those who have stumbled on this issue via Google, a working CORS configuration for S3 as per @chrisblizm's reply is:

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <CORSRule>
        <AllowedOrigin><!-- insert your origin here --></AllowedOrigin>
        <AllowedMethod>GET</AllowedMethod>
        <AllowedHeader>Range</AllowedHeader>
        <MaxAgeSeconds>3000</MaxAgeSeconds>
        <ExposeHeader>Accept-Ranges</ExposeHeader>
        <ExposeHeader>Content-Range</ExposeHeader>
        <ExposeHeader>Content-Encoding</ExposeHeader>
        <ExposeHeader>Content-Length</ExposeHeader>
        <AllowedHeader>Authorization</AllowedHeader>
    </CORSRule>
</CORSConfiguration>

The key is Access-Control-Allow-Headers "range".

Where should i add this? Inside the viewer.html head tag?

danjudgereveela commented 2 years ago

Good news, this isn't a spooky problem in network.js, it's all a matter of CORS configuration.

For the javascript viewer to display a remotely-hosted pdf document, the following configuration values are necessary:

Access-Control-Allow-Origin "whatever.hostname.you.have.or.*.for.all.of.them" Access-Control-Allow-Headers "range" Access-Control-Expose-Headers "content-range, content-length, accept-ranges" Access-Control-Allow-Methods "GET"

The Allow-Headers setting is necessary to let the xhr specify what range it wants - without this, the flow for the xhr's secondary request (after discovering that range requests are supported and the original GET should be canceled in favor of the partial content loading) stalls out at the OPTIONS preflight request. Once this setting is in place, the CORS workflow gets underway.

The Expose-Headers setting is necessary for the browser to include enough of the restricted headers into the response for the xhr to get the data to determine that a) range requests should be the way forward because the content is big enough to warrant it, and b) what byte range was received in the subsequent partial content requests.

Adding these settings into the apache configuration mentioned above allows the test url with the mozilla viewer.js above to work.

This resolved my issue, thank you @chrisblizm.

It was easy to implement in the Azure Portal for Blob Storage. For anyone needing to know how to configure blob storage in this way, I've attached a screenshot Microsoft-Azure .