headzoo / surf

Stateful programmatic web browsing in Go.
MIT License
1.48k stars 159 forks source link

Any interest in a collaborator? #63

Closed lxt2 closed 7 years ago

lxt2 commented 7 years ago

There seems to a lot of open PR's and issues that could pretty easily be brought in without breaking BC.

I end up using Surf quite a bit in various projects, would be quite happy to help out if you are inclined @headzoo.

headzoo commented 7 years ago

I've been hesitant to accept PR as I work on v2.0, but work has kept me too busy to give it the attention it needs. So yeah, I don't mind adding giving commit access to you and some other people to help keep v1.0 moving along.

lxt2 commented 7 years ago

No probs. Is there anything written up about design goals for v2?

headzoo commented 7 years ago

I haven't written down any specific goes yet, but I have written some code. I may end up deleting the v2 branch since it only serves to document my thoughts.

https://github.com/headzoo/surf/tree/v2

My long term goal with Surf v3.0 is having it behave like a true "headless" browser. Specifically giving it the ability to run javascript. Most likely by using Chrome V8 bindings. The trick with running JS inside of Surf is recreating the browser API. Chrome V8 executes Javascript, but it doesn't create a DOM or a window object, e.g. the whole browser API that scripts like jQuery expect to exist. That browser API has to be recreated in Surf.

My short term goal is taking the first steps towards that goal. So v2.0 will have an API similar to browsers in preparation for binding Surf to Chrome V8. It will have a new API that feels more like a real browser.

So, instead of getting the page title like this:

browser := surf.NewBrowser()
err := browser.Open("http://golang.org")
fmt.Println(browser.Title())

It would be done like this:

browser := surf.NewBrowser()
window := browser.Open("http://golang.org")
fmt.Println(window.Document.Title())

Which mimics how the page title would be read in javascript land, e.g. by reading window.document.title.

Right now I'm just kicking around some ideas, and not taking anything too seriously. Recreating the full browser API may be too large a task. We'll see how it goes.

headzoo commented 7 years ago

P.S. Don't forget to add your name to the README.md. ;)

lxt2 commented 7 years ago

I've been working on something similar @headzoo.

To be frank, I deeply suspect that it's not possible to do in a general way in pure golang. Browsers move too quickly at the JS level, trying to keep up would be really, really hard. Watching PhantomJS try and keep up is a good example of how hard it is, even when you have a native JS engine. I see something like the new Chrome Headless bindings as the way forward if you really want a headless browser.

I'd be interested in collaborating on something somewhere in the middle though, simple Browser interface like Surf, but with the ability to run certain scripts with DOM access. I just wouldn't ever expect it to run all scripts in a page like a browser would.

headzoo commented 7 years ago

I think we're on the same page. Once I really started digging into recreating the DOM I saw how futile the effort would be. Creating a full blown browser is literally a full time job. But I agree that we can tip-toe into the Javascript game by supporting some basic, plain vanilla Javascript commands. It might not be very useful since most sites are using jQuery/React/Angular/etc, which use features we could never hope to support, but I think the idea of executing Javascript is worth playing around with.

The only thing I really have in mind with v2.0 is getting any big breaking API changes out of the way. Meaning, taking a look at the current API and deciding what's good, what sucks, and fixing those problems in the next version. I think v2.0 is the best time for the "big rewrite" and then plan to keep the API stable from then on out. Or at least until v4.0. ;)

lxt2 commented 7 years ago

Yup, cool, agreed. I'd love a way to do Middleware or Plugins so that we can prototype some of this stuff outside of the Surf core.

headzoo commented 7 years ago

I always planned on a plugin system, but now I'm leaning towards supporting middleware. It's a concept most developers are familiar with, and middleware provides endless extensibility while being fairly simple to implement.

We could go with a hybrid approach, and use Go's new "plugin" package to add support for pluggable middleware. Which would make it simple to enable/disable middleware via configuration instead of having to re-compile to add/remove middleware.

headzoo commented 7 years ago

Though I wonder if the pluggable middleware idea would be best implemented at a higher level than Surf.