pdf.js relies on urls to contain the 'pdf' extension

octosquid commented 10 years ago

When the server doesn't provide the Content-Disposition header, pdf.js relies on urls to contain the 'pdf' extension. But URLs are locators, not names. Steps to reproduce:

mv web/compressed.tracemonkey-pldi-09.pdf web/compressed.tracemonkey-pldi-09
sed -i 's/compressed.tracemonkey-pldi-09.pdf/compressed.tracemonkey-pldi-09/g' web/viewer.js
firefox web/viewer.html

Now click on download. You will be offered a 'document.pdf' file. The name should be something more meaningful. The bug also happens when I leave out the pdf extension on my apache web server.

Proposed solution: Use the title of the pdf. (as this viewer.js code). The title is also used by firefox for the File -> "save page as" functionality when displaying a HTML page like http://en.wikipedia.org/wiki/Internet_media_type .

Not every html web page ends in .html. Instead by the extension, a document's type is specified by its MIME-type. However, most pdf files have the pdf extension, and most pdfs online also have a good-to-store name in the url. I don't know whether the new retrieval method should overwrite the url retrieval, or be a fallback to it.

See also #3455.

Mercieral commented 8 years ago

@timvandermeij

Any update on this? It has been open for more than two years now. Because my file param is a server call that sends a pdf file back, the pdf viewer is not able to detect the name of the file because it seems to be looking for a .pdf extension and so I'm stuck with "document.pdf" when downloading and "untitled.pdf" in the window bar when viewing.

It would be handy if we could also specify a "title" in the URI as well as the "file" such as .../pdf-viewer/viewer.html?file="..."&title="..."

timvandermeij commented 8 years ago

I know that currently work is being done in #7554 to support the Content-Disposition header, which is a way to solve this issue. I do agree, however, that document.pdf is not the best possible name and we might need to improve the function for getting the (file)name. Patches for this are welcome, so I'm labeling this as a good beginner bug as it should not be too hard to implement.

Mercieral commented 8 years ago

@timvandermeij Excellent thank you, I believe supporting Content-Disposition would actually fix my issue.

I agree, as I was going through the code I noticed it should not be too difficult to just add another URL param for the filename. I'll give it a try in the next few days, Thanks.

Snuffleupagus commented 8 years ago

Patches for this are welcome, so I'm labeling this as a good beginner bug as it should not be too hard to implement.

@timvandermeij Please remember that in PR #4956 we purposely moved away from letting various hash parameters affect the viewer (unless debugging is enabled, see https://github.com/mozilla/pdf.js/wiki/Debugging-pdf.js). Hence I do not think that we should make it possible to specify the title using a hash parameter!

Especially considering that it would be non-standard (in the context of http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_open_parameters.pdf), and compared to the Content-Disposition approach in PR #7554, it really doesn't add much value.

timvandermeij commented 8 years ago

Sorry, I should have been more clear. I meant that patches are welcome for improving the function that determines the file name from the URL. I think we can do better there instead of only relying on the file extension. I agree that we should not add more hash parameters.

anirudhrb commented 7 years ago

What's the status of this issue? Is this still open?

yurydelendik commented 7 years ago

What's the status of this issue? Is this still open?

@anirudhrb still opened, there was an attempt to implement that at #7554, would you like contribute to that?

anirudhrb commented 7 years ago

@yurydelendik Yes, I would like to contribute. What is expected in a PR for this issue?

yurydelendik commented 7 years ago

@anirudhrb, you may just take the above PR as a base since it has remoting of data somewhat right -- we would expect small patch with a unit tests. We don't need spec Content-Disposition parsing, but enough to get filename.

anirudhrb commented 7 years ago

@yurydelendik I have started working on this. This is my first attempt at contributing to an open-source project. I'll need some time to get comfortable with the codebase. :)

himanish-star commented 6 years ago

@yurydelendik , @timvandermeij Could I take this issue up if it's okay with you all?

timvandermeij commented 6 years ago

There is a pull request above which looks like the right direction, but there has been no more activity for it. If you're interested in fixing up that one, that sounds good. I'll ask if the original author is still planning to work on it.

timvandermeij commented 6 years ago

Fixed in #9379.

mozilla / pdf.js

pdf.js relies on urls to contain the 'pdf' extension #4284