headzoo / surf

Stateful programmatic web browsing in Go.
MIT License
1.48k stars 159 forks source link

UnBounded History growth can consume lots of memory #68

Open jtwatson opened 7 years ago

jtwatson commented 7 years ago

I use this lib in production for large batch updates where we may make 100,000+ calls in one run. I experienced exceeding 8Gig memory usage before the program was terminated by the OS.

Researching this issue I found it was simply the History object growing indefinitely until the Browser object was release, at which point the GC would cleanup.

I implemented some optional methods and default settings that do not change the current behavior, but make it simple to limit the growth of the History or just clear the history at will.

PR is on its way.

lxt2 commented 7 years ago

Do you actually need the History tracked for that? I'd probably offer a NoopHistory implementation that just dropped history. I don't use the History stuff either.

jtwatson commented 7 years ago

Yes, I guess that would be another way to approach this. In my situation I use Bookmarks but I never go back into History ether. But if you need history, and it grows too big, then you are in a bad place again.

I used the std lib container/list which is a doubly linked list. In my code I set the max to 10 or so and allow it to flow through the list. Not a lot of overhead to manage the pointers in the list and it keeps the History option open.

headzoo commented 7 years ago

Having a Browser.Close() method which frees resources sounds like a good idea.

jtwatson commented 7 years ago

If you called Browser.Close(), would you expect to be able to use it again?

With the current code, simply allowing the Browser object to go out of scope effectively releases all resources to the GC to cleanup.

My issue came in when I was needing to use the Browser over many requests. So I added a method Browser.ClearHistory() to explicitly clear history at a given point. I also added a Browser.SetMaxHistory(max int) to limit history growth to a number of items. I left the default at unlimited as to not change the exiting behavior and possibly cause a BC issue, but it might be okay to have a sensible default here of 100 items or something like that.

headzoo commented 7 years ago

The two methods to clear and limit the history sound good to me too, though I would probably put those methods on the jar.History interface. (In which case they would be named Clear() and SetMax().)

jtwatson commented 7 years ago

Those two methods do exist on jar.History, but the current Browser does not provide access to Browser.history. So you would have to do something like this:

b := surf.NewBrowser()
hist := jar.NewMemoryHistory()
hist.Max(100)
b.SetHistoryJar(hist)

Because this felt to painful, I added the corresponding methods to the Browser.

b := surf.NewBrowser()
b.SetMaxHistory(100)

Maybe it would just be better to make the History object accessible.

headzoo commented 7 years ago

Maybe it would just be better to make the History object accessible.

I think that approach would work best, and if we're doing that, we should make the other jar objects accessible as well.

jtwatson commented 7 years ago

Okay. I will work on that.

jtwatson commented 7 years ago

I removed the methods browser.SetMaxHistory() and browser.ClearHistory(). I also exposed 4 jar objects so they can be manipulated directly.

headzoo commented 7 years ago

Looks good, though the Browsable interface is missing the getter methods. Thanks for your effort in making these changes!

jtwatson commented 7 years ago

Done. We now have the same methods as part of the Browsable interface.

jtwatson commented 7 years ago

The last item that I would like to do is change the default history length to 100. What is the procedure to make this in a patch release?

headzoo commented 7 years ago

There's no procedure in place at the moment. Unless @lxt2 has some other idea, I would think we should tag v1.0.1 and then merge your changes.

lxt2 commented 6 years ago

Is there a PR for this that I've missed?

jtwatson commented 6 years ago

I just added a PR to change the default.

MikhailKlemin commented 4 years ago

I do not understand now... how would you limit history to 1 page now?

jtwatson commented 4 years ago

This should work:

b := surf.NewBrowser()
b.HistoryJar().SetMax(1)
MikhailKlemin commented 4 years ago

This should work:

b := surf.NewBrowser()
b.HistoryJar().SetMax(1)

There no HistoryJar() method

jtwatson commented 4 years ago

The PR was merged into the master branch, but has not been released with a version tag...

Checkout master and you should be seeing the new Methods.