headzoo / surf

Stateful programmatic web browsing in Go.
MIT License
1.48k stars 159 forks source link

Implement http.Client reuse so we are not creating a new client with each call #46

Closed jtwatson closed 7 years ago

jtwatson commented 7 years ago

Implement http.Client reuse so we are not creating a new client for every request. Misc comment cleanup.

This change should not cause any BC breaks that I can see. I used a Singleton to create the client and to replace the buildClient method. Appropriate setters were changed to operate on the client as well, so all usage cases should be the same.

lxt2 commented 7 years ago

This seems pretty sensible. To me, chaining the singleton methods doesn't read as very go-like. Thoughts on replacing those httpClient call sites with something like:

func (bow *Browser) SiteCookies() []*http.Cookie {
   if bow.client == nil {
      bow.client = bow.createClient()
   }
   return bow.client.Jar.Cookies(bow.Url())
}

To me that feels more explicit, but it might just be a matter of style.

jtwatson commented 7 years ago

I agree. I like the more explicit code. Easier to clearly understand what is happening.

headzoo commented 7 years ago

Keep in mind one of the goals here is allowing devs to create hundreds of browsers performing different tasks in separate goroutines. Changes you make need to be essentially "thread safe." You can reuse objects as long as doing so doesn't interfere with other browsers doing different things.

jtwatson commented 7 years ago

Yes. That was my intent.

The current code instantiates a new client := &http.Client{} for each call, setting client.Jar, client.CheckRedirect and client.Transport to their respective instance which persist for the life of the Browser{} object.

My change creates the client only one time during the life of the Browser object, setting Jar, Transport and CheckRedirect in the same manner. Since http.Client is fully go-routine safe, this should not introduce any issues.

For https and especially http2 connections, there should be significant performance benefits from this change. Each time the client is recreated, all TLS connections will need to be re-negotiated from scratch. By using a persistent client, existing connections can be reused.

In my use case, I am connecting to a single endpoint and making 100,000+ calls for a large batch operation once per day. I have been using this patched code in production for about 8 months now and it seems to be solid.

headzoo commented 7 years ago

@jtwatson - Excellent. Glad to hear the library is being used in production, and happy to have your contributions.

lxt2 commented 7 years ago

Changes you make need to be essentially "thread safe."

Yup, agreed. That's also a design goal of mine too @headzoo :)

lxt2 commented 7 years ago

🚀