scrapy / scrapyd

A service daemon to run Scrapy spiders
https://scrapyd.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
2.94k stars 571 forks source link

Using Selenium in project but geckodriver not close after finished. #378

Closed Dashu-Xu closed 2 years ago

Dashu-Xu commented 4 years ago

Hi, thx for all the developer in this project !

while using scrapyd, i got some problem. i'm using selenium in my project, when the job finished, the browser page close as usual, but the driver still running background, thus this job still show under the running item, i try to kill the job via the pid, but i can't find the process in the taskmanager with this pid.

i'm using scrapyd in windows server 2012 with scrapy V1.5.0 and scrapyd V1.2.1

is there any way to terminate the driver? i'm afraid if not kill it, it will slow down the server day by day.

Dashu-Xu commented 4 years ago

add some code in webservice.py at line 98, will this help ?

def kill_driver(self):
        pids = psutil.pids()
        for pid in pids:
            try:
                p = psutil.Process(pid)
                if p.name() == "firefox.exe":
                    cmd = 'taskkill /F /IM firefox.exe'
                    os.system(cmd)
                elif p.name() == "geckodriver.exe":
                    cmd = 'taskkill /F /IM geckodriver.exe'
                    os.system(cmd)
                else:
                    continue
            except:
                pass

this will only work for Firefox.

kutschkem commented 4 years ago

How do you start the driver?

Dashu-Xu commented 4 years ago

How do you start the driver?

i start the driver in Middleware like this

class SeleniumMiddleware():
    def __init__(self,server_ip,proxyde, timeout=None):
        self.logger = getLogger(__name__)
        self.timeout = 60
        self.server_ip = server_ip
        self.proxyde = proxyde
        self.browser = webdriver.Firefox()
kutschkem commented 4 years ago

You shouldn't kill the process with the pid, you should use the API method of the driver to close it, which is quit(). That should spare you the issue of trying to deal with processes directly.

https://stackoverflow.com/a/41564880/1319284

Dashu-Xu commented 4 years ago

after i call the cancel api, the browser is stop crawl data, but it just stay there without close itself, i want close it via the spider_closed signal, but somehow it not work.

dispatcher.connect(self.quit_browser, signals.spider_closed)
def quit_browser(self, spider, reason):
        print("[+] Spider Closed With: ", reason)
        self.browser.quit()

i find some discussions in the StackOverflow, they said it's a kind of bug in windows

kutschkem commented 4 years ago

@Dashu-Xu Yes, #83 is still open. I don't know if there is another way to fix your problem. Cancel doesn't trigger shutdown handlers.

Dashu-Xu commented 4 years ago

@Dashu-Xu Yes, #83 is still open. I don't know if there is another way to fix your problem. Cancel doesn't trigger shutdown handlers.

thanks for your reply, i still shutdown browser via taskkill command

jpmckinney commented 2 years ago

Closing as the remaining issue is the same as #83.