viewdy / phantomjs2

Automatically exported from code.google.com/p/phantomjs
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

File download #52

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
It would be good to accept (and save) 'Content-Disposition: attachment; 
filename=' content.

Original issue reported on code.google.com by alexsa...@gmail.com on 28 Feb 2011 at 10:15

GoogleCodeExporter commented 9 years ago
This is again related to issue 41.

Original comment by ariya.hi...@gmail.com on 28 Feb 2011 at 11:51

GoogleCodeExporter commented 9 years ago
Issue 92 has been merged into this issue.

Original comment by roejame...@gmail.com on 26 Apr 2011 at 5:52

GoogleCodeExporter commented 9 years ago
I'm trying to implement this functionality and not making much progress.  Using 
the attached patch, I run:

$ bin/phantomjs examples/download.js

and get this output:

WebPage instantiated
WebPage instantiated
Download complete - fail
<html><head></head><body></body></html>

I added cout of "WebPage instantiated" (to verify my debug messages work as 
expected).  I also added a cout in my downloadRequested slot.  That one did not 
get displayed.  Can someone spot what I'm doing wrong or let me know if I'm  on 
the completely wrong track?

Here is where I found out about the downloadRequested signal: 
http://doc.qt.nokia.com/latest/qwebpage.html#downloadRequested

Original comment by brian.th...@gmail.com on 23 Jun 2011 at 3:13

Attachments:

GoogleCodeExporter commented 9 years ago
Whoops, here is the patch file attachment without the ANSI color codes

Original comment by brian.th...@gmail.com on 23 Jun 2011 at 3:16

Attachments:

GoogleCodeExporter commented 9 years ago
Any progress on this issue?

Original comment by nperria...@gmail.com on 16 Aug 2011 at 3:05

GoogleCodeExporter commented 9 years ago
No progress as of now.

Original comment by ariya.hi...@gmail.com on 16 Aug 2011 at 4:56

GoogleCodeExporter commented 9 years ago
A friend of mine (http://svay.com/) just told me a nice trick for dealing 
around with this issue, using XHR within the page environment and base64 
encoding to retrieve file contents and it works rather great. For the record 
you can find an example here: http://jsfiddle.net/3kUXy/

Original comment by nperria...@gmail.com on 16 Aug 2011 at 4:59

GoogleCodeExporter commented 9 years ago
The URL to the file is not always known so XHR is not a general solution. For 
instance, if you are downloading a utility/bank/cc statement, you may have to 
click a link which will possibly execute some JS code and trigger another page 
load with a frame embedding the PDF. Or the statement comes in as an 
attachment. 

What will it take to support the file download feature? 

Requirement: Download files that come in embedded in the page/frame or as 
attachments. The URLs may or may not be known. Allow saving the files to the 
file system or "upload" them to a web server (so the server can save the files 
in a DB for instance).

Original comment by gopiredd...@gmail.com on 27 Jul 2012 at 8:16

GoogleCodeExporter commented 9 years ago
I've got an early but functional version of this at 

https://github.com/woodwardjd/phantomjs/tree/add_download_capabilities

Example:

var page = require('webpage').create();

page.onUnsupportedContentReceived = function(data) {
   console.log('Got a download at url: ' + data.url);
   page.saveUnsupportedContent('some.file.path', data.id);
   phantom.exit();
}

page.open('http://some.pdf.url.com/some.pdf');

I call this "early but functional" because it works where I've tested it 
(linux, PDF downloads), but has a likely small memory leak, and I'm not 100% 
convinced the callback mechanism I used is idea.

Comments desired.

Original comment by ja...@recovend.com on 10 Aug 2012 at 6:21

GoogleCodeExporter commented 9 years ago
I've downloaded and built the git for above, but I can't seem to get the 
onUnsupportedContentReceived event to fire and calling saveUnsupportedContent 
throws an undefined error.  Are there special build steps required to enable it?

Thanks,
Robert

Original comment by rotava...@gmail.com on 1 Sep 2012 at 4:21

GoogleCodeExporter commented 9 years ago
No special build steps required, as far as I know.  If
saveUnsupportedContent is undefined, maybe you haven't built the version in
the add_download_capabilities branch  (git checkout
add_download_capabilities  after the git clone)?  Just speculating.

Original comment by ja...@recovend.com on 4 Sep 2012 at 2:44

GoogleCodeExporter commented 9 years ago
I second the XHR+base64 method. It takes another 50+ lines of code to send to 
page.evaluate(), and I have to de-base64 the content afterward, and that's 
basically how CasperJS does it (as far as I can tell from their code—they do 
a lot of weird (unnecessary, in my book) binding with window.__utils__ in the 
page context).

I used this one (first answer):
http://stackoverflow.com/questions/7370943/retrieving-binary-file-content-using-
javascript-base64-encode-it-and-reverse-de

It works great. Just be sure to try-catch the call to base64ArrayBuffer(), 
because Uint8Array(arrayBuffer) may throw an error, and check 
xhr.getHeader('content-type') == 'application/pdf' if you're doing pdf 
downloads like I was.

Original comment by audi...@gmail.com on 4 Sep 2012 at 3:24

GoogleCodeExporter commented 9 years ago
I need this as well. Can't use the XHR method because the inline attachments I 
need to scrape don't come with a URL I can hit.

Original comment by subel...@gmail.com on 4 Oct 2012 at 11:03

GoogleCodeExporter commented 9 years ago
Wouldn't inline attachments be even more easily downloaded? For an image:
var content = page.evaluate(function() {
  return $('img#whatever').attr('src');
});
fs.write(yer_path, content, 'w');

---

Ariya, can you give some estimate of how long this feature (downloading a url) 
would take to implement? I'd love to get involved in PhantomJS development, but 
maybe this issue is a lot trickier than it sounds?

Original comment by audi...@gmail.com on 4 Oct 2012 at 1:59

GoogleCodeExporter commented 9 years ago
Sorry, I didn't mean to write "inline". The file I need is not an image and is 
not part of the DOM. It gets sent as a result of a POST with the 
Content-Disposition header 'attachment;filename="report.csv"'

Original comment by subel...@gmail.com on 5 Oct 2012 at 10:51

GoogleCodeExporter commented 9 years ago
Hi there. I think the base64-encoding solution can only be a stop-gap solution.

- Downloading big files will probably exhaust memory and base64 encoding and 
-decoding it will use up resources that would have better been spent elsewhere 
- therefore we want to have the option to redirect a downloaded stream to file
- We may have pages where we cannot control the loading of a file that is not 
supported (e.g. PDF)
- We may want to save resources that have already been loaded as part of the 
page (e.g. images)

I think the optimal solution would be to add functionality to the 
onResourceReceived hook to allow setting up a "redirection" handler, and if 
such a handler is set, unsupported file formats should silently be downloaded. 
This handler could then have another onDownloadFinished hook to resume 
operation once the download is done.

Original comment by bogusan...@gmail.com on 20 Nov 2012 at 4:48

GoogleCodeExporter commented 9 years ago

Original comment by james.m....@gmail.com on 12 Jan 2013 at 4:33

GoogleCodeExporter commented 9 years ago
Closing. This issue has been moved to GitHub: 
https://github.com/ariya/phantomjs/issues/10052

Original comment by james.m....@gmail.com on 16 Mar 2013 at 12:17