wkeeling / selenium-wire

Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.
MIT License
1.9k stars 254 forks source link

Seleniumwire error with multiprocess pool #384

Open 99hansling opened 3 years ago

99hansling commented 3 years ago

I found selenium webdriver is not thread safety, so I tried multiprocess. However, after using multiprocess, the webdriver remain blank and only shows data; in url address. seleniumwire operated fine before applying multiprocess. Here is error log:

Process SpawnPoolWorker-1:
Traceback (most recent call last):
  File "G:\Microsoft\Visual Studio Shared\Python37_64\lib\socket.py", line 716, in create_connection
    sock.connect(sa)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "G:\Microsoft\Visual Studio Shared\Python37_64\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "G:\Microsoft\Visual Studio Shared\Python37_64\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "G:\Microsoft\Visual Studio Shared\Python37_64\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "G:\alterTestFullMulti.py", line 744, in MultiHandle
    driver = webdriver.Chrome(executable_path='G:/Microsoft/Visual Studio Shared/Python37_64/chromedriver.exe',chrome_options=options)
  File "C:\Users\hans9\AppData\Roaming\Python\Python37\site-packages\undetected_chromedriver\__init__.py", line 53, in __new__
    instance.__init__(*args, **kwargs)
  File "C:\Users\hans9\AppData\Roaming\Python\Python37\site-packages\seleniumwire\webdriver.py", line 114, in __init__
    super().__init__(*args, **kwargs)
  File "C:\Users\hans9\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in __init__
    self.service.start()
  File "C:\Users\hans9\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\common\service.py", line 99, in start
    if self.is_connectable():
  File "C:\Users\hans9\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\common\service.py", line 115, in is_connectable
    return utils.is_connectable(self.port)
  File "C:\Users\hans9\AppData\Roaming\Python\Python37\site-packages\selenium\webdriver\common\utils.py", line 106, in is_connectable
    socket_ = socket.create_connection((host, port), 1)
  File "G:\Microsoft\Visual Studio Shared\Python37_64\lib\socket.py", line 716, in create_connection
    sock.connect(sa)

I wonder if this problem is caused by inner proxy in seleniumwire, which cannot be applied simultaneously with multiple processes.

wkeeling commented 3 years ago

Thanks for raising this.

There have been some issues reported with running Selenium Wire multithreaded on Windows. It seems less of a problem on Linux.

Can you share the code that you're using to run the webdriver and any options you're passing.

99hansling commented 3 years ago

Thanks for your quick response.

Multithread is different from Multiprocess in python, and Multithread only use one process so that the inner proxy can be shared by webdrivers of multiple threads. And I suppose maybe Multiprocess cause inner proxy error of seleniumwire because this proxy can not be multiply instantiated as processes.

Since you mentioned that on linux there would be less error, I'll have a try.

Here is my code for reference:

 def MultiHandle(raw_rules_list,url):
    #webdriver config
    options = webdriver.ChromeOptions()
    #extension for eliminate cookie privacy; location should be changed
    #options.add_extension('C:/Users/hans9/AppData/Local/Google/Chrome/User Data/Default/Extensions/fihnjjcciajhdojfnbdddfaoknhalnja/3.3.0_0.crx')
    #options.add_extension('C:/Users/hans9/AppData/Local/Google/Chrome/User Data/Default/Extensions/aomidfkchockcldhbkggjokdkkebmdll/2.2.2_0.crx')
    options.add_extension('./3.3.0_0.crx')

    driver = webdriver.Chrome(executable_path='G:/Microsoft/Visual Studio Shared/Python37_64/chromedriver.exe',chrome_options=options)
    driver.delete_all_cookies()

    driver.set_page_load_timeout(60)
    driver.set_script_timeout(60)  

    #database
    DBsession,engine = Database.initSession()
    session = DBsession()

    UrlStateFlag=0
    #0:https + no www
    #1:https+www
    #2:http+no www
    #3 http+www

    os.mkdir('./adFraud_dataGathering'+'/'+Url_output_Handling(url))

    try:
        extract_Ad(url,raw_rules_list)

    except UnexpectedAlertPresentException as e:
        print("[Error]"+str(e))
        with open('./adFraud_dataGathering'+'/'+Url_output_Handling(url)+'/'+'UnexpectedAlertPresentException.txt','w',encoding='utf-8') as f:
            f.write("[Error]"+str(e)+'\n')

    except TimeoutException as ex:
        print("Outer Page Loading Time Out")
        with open('./adFraud_dataGathering'+'/'+Url_output_Handling(url)+'/'+'TimeoutException.txt','w',encoding='utf-8') as f:
            f.write("[Page Loading Time Out outer situation]"+"60s has passed, we must go forward; unknown error, we can not go ahead anymore! :"+str(ex)+'\n')

    while len(driver.window_handles)>1:
        driver.switch_to_window(driver.window_handles[-1])
        driver.close()

    driver.switch_to_window(driver.window_handles[0])

    session.close()
if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=5)

        with open(filename,'r') as csv_file:
            csv_reader = csv.reader(csv_file, delimiter=',')

            for row in csv_reader:
                url=scheme + row[0]
                pool.apply_async(MultiHandle,(raw_rules_list,url,))

            pool.close()
            pool.join()

extract_Ad is my main processing function which runs normally before.

I put create driver and create DB session in MultiHandle in order to make each process able to instantiate independent driver and DB session. The same way for close driver and close DB session.

Hope this can help.

99hansling commented 3 years ago

By the way, I just remember that I've tried driver.quit() to close webdriver instance, but after that I cannot open a new webdriver. Have you encountered this problem?

wkeeling commented 3 years ago

I've not encountered the issue you describe with driver.quit(). Perhaps you could open a new issue with some example code that demonstrates the problem? I'll try to reproduce and fix. Thanks!

99hansling commented 3 years ago

I've not encountered the issue you describe with driver.quit(). Perhaps you could open a new issue with some example code that demonstrates the problem? I'll try to reproduce and fix. Thanks!

OK, I'll try driver.quit() after that. Maybe it is a problem of selenium but not seleniumwire.

99hansling commented 3 years ago

I've not encountered the issue you describe with driver.quit(). Perhaps you could open a new issue with some example code that demonstrates the problem? I'll try to reproduce and fix. Thanks! Thanks for your response! However, I'm more concentrating on multiprocess and multithread of seleniumwire. Have you tested them and found out some problems and differences(like proxy running situation) between them(not relative to OS)?