headzoo / surf

Stateful programmatic web browsing in Go.
MIT License
1.48k stars 159 forks source link

No way to access the underlying transport mechanism #15

Closed Mojofreem closed 9 years ago

Mojofreem commented 9 years ago

While attempting to manipulate an https url, I encountered problems with an unverified x509 certificate:

err = x509: certificate signed by unknown authority

Googling indicated that this is a known issue for some sites under OSX. The recommended fix was to adjust the http.Transport to ignore unverified certificates. Reference the following Google groups link:

https://groups.google.com/forum/#!topic/golang-nuts/v5ShM8R7Tdc

I modified my version of surf to expose a method to set the transport, and this fixes my issue locally. Would you mind reviewing my diff and consider integrating it with the mainline?

diff --git a/browser/browser.go b/browser/browser.go
index 48d8afe..01121b7 100644
--- a/browser/browser.go
+++ b/browser/browser.go
@@ -168,6 +169,10 @@ type Browser struct {

        // refresh is a timer used to meta refresh pages.
        refresh *time.Timer
+
+       // transport specifies the mechanism by which individual HTTP
+       // requests are made.
+       transport *http.Transport
 }

 // Open requests the given URL using the GET method.
@@ -423,6 +428,11 @@ func (bow *Browser) SetHeadersJar(h http.Header) {
        bow.headers = h
 }

+// SetTransport sets the http library transport mechanism for each request.
+func (bow *Browser) SetTransport(t *http.Transport) {
+       bow.transport = t
+}
+
 // AddRequestHeader sets a header the browser sends with each request.
 func (bow *Browser) AddRequestHeader(name, value string) {
        bow.headers.Add(name, value)
@@ -498,7 +508,7 @@ func (bow *Browser) Find(expr string) *goquery.Selection {

 // buildClient creates, configures, and returns a *http.Client type.
 func (bow *Browser) buildClient() *http.Client {
-       client := &http.Client{}
+       client := &http.Client{Transport: bow.transport}
        client.Jar = bow.cookies
        client.CheckRedirect = bow.shouldRedirect
        return client
diff --git a/surf.go b/surf.go
index f7ff95f..63f6c6f 100644
--- a/surf.go
+++ b/surf.go
@@ -2,6 +2,8 @@
 package surf

 import (
+       "net/http"
+
        "github.com/headzoo/surf/agent"
        "github.com/headzoo/surf/browser"
        "github.com/headzoo/surf/jar"
@@ -34,6 +36,7 @@ func NewBrowser() *browser.Browser {
                browser.MetaRefreshHandling: DefaultMetaRefreshHandling,
                browser.FollowRedirects:     DefaultFollowRedirects,
        })
+       bow.SetTransport(&http.Transport{})

        return bow
 }
headzoo commented 9 years ago

Sorry for the super late reply, but yes, of course this would be a fine addition. I'll merge it today.

headzoo commented 9 years ago

I made one small change. Instead of this:

client := &http.Client{Transport: bow.transport}

I used this:

    if bow.transport != nil {
        client.Transport = bow.transport
    }

I'm also not setting the transport automatically in surf.NewBrowser(). We'll let the http.Client use it's default transport mechanisms unless the developer explicitly sets their own.