scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.09k stars 513 forks source link

Why can't the memory be released even when there is no request? #476

Open tripleday opened 8 years ago

tripleday commented 8 years ago

Hello, i have a few problems about splash. I wish i could get some help.

kmike commented 8 years ago

_gc calls everything I could find in qt API to clear caches, but it looks like sometimes this is not enough. If you want to handle restarts gracefully the way to go is to setup a load balancer (e.g. haproxy) which re-schedules requests to other instances if they are dropped. See e.g. an example config template here (it is not a config itself, it is a cookiecutter template).

tripleday commented 8 years ago

Thanks for your reply. The way I use to handle the memory problem is to close the docker every 1000 urls and open another docker listening on another port for the following urls. These can be done by using docker-py. Besides, the memory problem doesn't happen in all cases. When I use render.json endpoint, the memory won't increase to infinite. When it comes to render.html, the problem shows. Of course, what I saw may be casual and needs to be confirmed. Another interesting thing I find is that when the memory problem shows, Splash can renders the urls very quickly, but when the memory can be controlled, Splash renders at a low speed of 1 url per sec. In the case of the memory problem, balancing the load won't help a lot because the memory will running out in a short time.

PornthipSaechong commented 7 years ago

I had this issue too. I found a work around by calling /execute rather than /render.html and pass custom lua script to close the rendered window. I believe render.html only load webpage and not closing it. This leads to memory leak. By forcing the window to close every time, it frees up the memory taken up. I did this and docker memory became very stable.

Here is the lua script I use:

function main(splash) local url = splash.args.url assert(splash:go(url)) assert(splash:wait(0.5)) local html = splash:html() splash:runjs("window.close()") return html end

Hope this helps :)

ngodai commented 5 years ago

@PornthipSaechong thank you for your help. But I try render this site: https://news.zing.vn by your lua script. I got problem, memory increase to out of memory only in 2 - 3 minutes(My computer is 8G Ram).

Mideen commented 5 years ago

@PornthipSaechong I tried to close the window with /execute API after rendering the URL with your sample Lua script. But I think the window is not closed. I can able to execute the js and even i can getting the HTML content(splash : html()) after closing the window.

function main(splash) local url = splash.args.url assert(splash:go(url)) assert(splash:wait(0.5)) local html = splash:html() splash:runjs("window.close()") return { html = splash:html(), jsOutput = splash:evaljs('window.location.href') } end

Mideen commented 5 years ago

Please, anyone, help me to understand this

111qqz commented 5 years ago

same problem here. Anyone have a idea?