Closed vkryukov closed 9 years ago
Looking at the code, looks like browser.httpRequest unconditionally calls go query.NewDocumentFromResponse
, which parses the body as HTML. I'm not sure what's the best way to override this behavior - any ideas?
I could have built an http.Client
myself, however Browser.cookies
is not exported, and neither is Browser.buildClient
- maybe we should export the latter for cases such as this?
Another idea I tried was to use DownloadAsset
, but it uses plain http.Get
and so cannot leverage cookies already set by the bow
. That, BTW, will make it impossible to download assets that require authorization.
I'm also having problem with getting a CSV file, but I can't even get as far as @victorkryukov
func SavedQuery(q int) {
// Takes a saved query ID from SIS and downloads exported CSV
query := fmt.Sprintf("https://example.com/esweb.asp?WCI=Results&Query=%v", q)
err := bow.Open(query)
if err != nil {
panic(err)
}
// Accessing the exported URL directly does not work.
// I have to go the saved Query URL first and then click 'Export'
bow.DelRequestHeader("Referer")
bow.Click("a:contains(' Export')")
if err != nil || bow.StatusCode() != 200 {
panic(err)
}
// Next click on "Comma-Delimited Text File"
bow.DelRequestHeader("Referer")
bow.Click("a:contains('Comma-Delimited Text File')")
if err != nil || bow.StatusCode() != 200 {
panic(err)
}
// Next click on the link to download CSV
bow.DelRequestHeader("Referer")
// f := bow.Links()[0]
// bow.Download(f.URL)
bow.Click("a:contains('.csv')")
if err != nil || bow.StatusCode() != 200 {
// fmt.Println(bow.Body())
// fmt.Println(bow.StatusCode())
// fmt.Println(bow.ResponseHeaders())
// fmt.Println(bow)
// panic(err)
}
fmt.Println(bow.Body())
}
This returns StatusCode 406. Here is the body of the response.
<h1>The resource cannot be displayed</h1>
The page you are looking for cannot be opened by your browser because it has a file name extension that your browser does not accept.
<hr/>
<p>Please try the following:</p>
<ul>
<li>Change the Multipurpose Internet Mail Extensions (MIME) or security settings of your browser to accept the file name extension of the requested page. Note that your browser might currently be configured in a highly secure mode that protects your computer. Please read the Help for your browser before changing any settings.</li>
</ul>
<h2>HTTP Error 406 - Client browser does not accept the MIME type of the requested page.<br/>Internet Information Services (IIS)</h2>
<hr/>
<p>Technical Information (for support personnel)</p>
<ul>
<li>Go to <a href="http://go.microsoft.com/fwlink/?linkid=8180">Microsoft Product Support Services</a> and perform a title search for the words <b>HTTP</b> and <b>406</b>.</li>
<li>Open <b>IIS Help</b>, which is accessible in IIS Manager (inetmgr),
and search for topics titled <b>Setting Application Mappings</b>, <b>Securing Your Site with Web Site Permissions</b>, and <b>About Custom Error Messages</b>.</li>
</ul>
Thanks for submitting a ticket. The Content-Disposition:[attachment; filename=source.csv]
header instructs the browser to save the page as a file as the type specified by the Content-Type:[text/csv; charset=ISO-8859-1]
header. @victorkryukov is right. The Download()
method blindly assumes the current page is text/html
. In fact the method doesn't take the response headers into consideration at all.
I'll try to create a fix today.
This should be fixed in the latest master. The Download()
method now writes the raw response body instead of using the value of bow.state.Dom.Html()
.
Hi @headzoo - I can confirm that my issue is fully resolved now. Many thanks!
@victorkryukov - Thank you!
Hello,
I'm using
surf
to login to some website and download a CSV report. My problem is that the report is downloaded as an HTML file, not as a plain text: afterI get a file which is prepended with
<html><head></head><body>
, some symbols are HTML-escaped, etc.When I print the headers,
I get:
Looks like content-type
text/csv
is not recognized. When I follow thereportURL
link with a browser, I get the file downloaded properly.Any advice on what's the best way to download the file properly? Or may be it's a bug/feature request...