samvera-deprecated / sufia

[DEPRECATED] Sufia: a fully featured, flexible Samvera repository front-end.
http://sufia.io/
Other
111 stars 78 forks source link

PDF Downloads from DownloadsControllerBehavior are exceptionally slow in Chrome. #1527

Open scande3 opened 8 years ago

scande3 commented 8 years ago

I've banged my head against this for about a week now... but clicking "download" on a PDF with image files takes quite some time before any indication that it is loading is given.

This appears to happen in Scholarsphere. Some example (if you try them in Chrome): https://scholarsphere.psu.edu/downloads/x346dr33j https://scholarsphere.psu.edu/downloads/w95050482

An example from the system I've been working with (Sufia 6.4.0 based): https://www.digitaltransgenderarchive.net/downloads/dn39x154f

In the meantime, I've hacked the primary display to force the file download that is instantaneous. You can see there is no delay for a forced download for that same object above at: https://www.digitaltransgenderarchive.net/files/dn39x154f (Not a permanent solution as non-technical users may not realize the file was downloaded when they clicked...).

As I've tried everything I can think of and the symptom is in other Sufia heads with no open issue, reporting it to see if anyone has an idea on how to correct this beyond generating individual images for every page of the PDF? Thanks!

mjgiarlo commented 8 years ago

I'd be interested in hearing from other folks who've got Sufia 6-based apps up. Thoughts, @cam156 @awead @hectorcorrea @pgwillia @jcoyne @hackmastera @weiweishi @ojlyytinen @elrayle @narogers @barmintor :question:

Thanks for reporting this, @scande3

awead commented 8 years ago

Yes, I've noticed this recently as well. I think it's some kind of PDF viewer that Chrome uses. The files load much faster in Safari. I haven't tried out Firefox yet.

hectorcorrea commented 8 years ago

FireFox downloads pretty fast. I wonder what the PDF viewer in Chrome does different that takes that long.

@scande3 what did you do in your hack to improve the speed?

scande3 commented 8 years ago

@hectorcorrea - Can link to the two relevant places later tonight as can override at least one of them in a better location. Essentially added a html "download" tag to "render_download_icon" and adding ".pdf" to the end of the filename in DownloadsControllerBehavior (when the mimetype is "application/pdf" for that case, obviously). But this really isn't much of a solution.

Yes, only Chrome has this issue that I am aware of. It doesn't actually affect very, very small PDF files but get exponentially worse the larger the PDF file actually is. For some reason, it doesn't appear to affect things when running both the application and fedora 4 under localhost which has made this challenging to debug. Attempts to stream the file via other means by overriding parts of DownloadsControllerBehavior have yet to yield positive results.

awead commented 8 years ago

@scande3 is it possible to link directly to the content in Fedora? This would rule out problems with Hydra's downloads controller. Although, now that I think about it, a link to Fedora would probably just download the file as opposed to putting in view in the page.

scande3 commented 8 years ago

@awead - I think it would indeed force the download in Fedora 4 (along with my lack of knowledge on how to bypass the login requirement in that system).

In Fedora 3, this issue does not occur with a direct link when rendering in Chrome though. From a sample Digital Commonwealth item of: http://ark.digitalcommonwealth.org/ark:/50959/j0994m20m , 18.6 MB PDF of the following renders quickly on my connection in Chrome without issue: https://fedora.digitalcommonwealth.org/fedora/objects/commonwealth:j0994q47n/datastreams/productionMaster/content

ojlyytinen commented 8 years ago

I've not noticed this on our system, but that's probably mostly because we don't have a lot of material yet and most of it's not PDFs.

But I thought of something I noticed some time ago in relation to the one-time-use links. Opening a PDF didn't work because Chrome would send an http request for the file and only after getting a response back from the server it would realise that it needs to open the PDF reader which would send another http request for the file. In the case of one-time-use links the second request would fail because the file had already been accessed. Maybe it's again doing two requests for the file and the second one, which is the one that actually matters, is delayed somehow because of the first? This is just pure guessing but if you're out of ideas it might be something to investigate.

ojlyytinen commented 8 years ago

I looked at the network traffic a bit and it seems that the Chrome PDF reader downloads the file in fairly small chunks by sending a whole bunch of http requests using the Range header. For one of the examples above I got 13 http requests in total for a file sized 770kB. First request gets the first 32kB of the file, the second is the next 260 kB or so, then the last 32kB of the file and then the remaining bits of the file sequentially in 32kB chunks. Each of these creates two LDP requests to Fedora.

hectorcorrea commented 8 years ago

@ojlyytinen this is very interesting. I would be curious how many requests FireFox or Safari send for the same file.

ojlyytinen commented 8 years ago

Haven't tried Safari but Firefox gets the entire file in one go. Chrome does too if you click Save as instead of opening it in the browser.

scande3 commented 8 years ago

For those with this issue - I have a temporary solution of the following lines that seems to be working in Chrome for me and you can test with the DTA example in the original post:

https://github.com/CollegeOfTheHolyCross/dta_sufia/blob/master/app/controllers/downloads_controller.rb#L10-L18

(I'm not sure what part of the normal DownloadsControllerBehavior is causing it still... would require more research... but that works for the moment).

barmintor commented 8 years ago

@scande3 curiosity: what version of Fedora 3? It didn't support range requests until the ultimate releases.