zytedata / zyte-smartproxy-selenium

A wrapper over Selenium Wire to provide Zyte Smart Proxy Manager specific functionalities.
3 stars 4 forks source link

Passing Session ID to new webdriver from another webdriver does not work #6

Closed mfahmirukman closed 1 year ago

mfahmirukman commented 1 year ago

Hi, I'm trying to achieve re-using session id as per this documentation

I tried passing session_id as header 'X-Crawlera-Session' for new webdriver. But the new webdriver creates a new session instead of using the passed ones. Is this intended?

@contextmanager
def init_driver(self, session_id=None):
    opts = Options()
    opts.add_argument("--headless")
    opts.add_argument("--no-sandbox")
    opts.add_argument("--disable-gpu")
    opts.add_argument("--disable-dev-shm-usage")
    opts.add_argument("--disable-extensions")
    opts.add_argument(f"user-agent={USER_AGENT}")
    # s = Service("/usr/bin/chromedriver")
    # return zyte_driver.Chrome(service=s, options=opts)

    # Basically to reuse the session instead of creating new one every time
    # We pass it as parameter from the uppermost driver
    # Reference https://docs.zyte.com/smart-proxy-manager/sessions.html#example-using-python-and-scrapy
    if session_id == None:
        print("session_id None")
        driver = zyte_driver.Chrome(
            chrome_options=opts,
            spm_options={
                "spm_apikey": ZYTE_SMART_PROXY_API_KEY,
                "headers": {
                    "X-Crawlera-No-Bancheck": "1",
                    "X-Crawlera-Profile": "desktop",
                    "X-Crawlera-Cookies": "disable",
                    "X-Crawlera-Session": "create",
                },
                # 'static_bypass': False,
            },
        )
    else:
        print("session_id is passed: ", session_id)
        opts.add_argument(f"X-Crawlera-Session:{session_id}")
        driver = zyte_driver.Chrome(
            chrome_options=opts,
            spm_options={
                "spm_apikey": ZYTE_SMART_PROXY_API_KEY,
                "headers": {
                    "X-Crawlera-No-Bancheck": "1",
                    "X-Crawlera-Profile": "desktop",
                    "X-Crawlera-Cookies": "disable",
                },
                # 'static_bypass': False,
            },
        )
    try:
        yield driver
    finally:
        print("driver.spm_session_id", driver.spm_session_id)
        driver.quit()

and the main function is something like this

if __name__ == "__main__":
    with self.init_driver() as driver:
        driver.get(url)
        with self.init_driver(session_id=driver.spm_session_id) as driver2:
            driver2.get(url)

The output I get is this

session_id None
session_id is passed:  1003910761
driver.spm_session_id 620371626
session_id is passed:  1003910761
driver.spm_session_id 1699005727
session_id is passed:  1003910761
driver.spm_session_id 1003910761
storymode7 commented 1 year ago

Hi @mfahmirukman,

It should be passed a header in spm_options instead of Chrome options.

Currently this driver ignores the header and always generates a new session ID, I've worked out a fix and will try to release it tomorrow.

storymode7 commented 1 year ago

Hi @mfahmirukman,

The new version 1.0.10 is released: https://pypi.org/project/Zyte-SmartProxy-Selenium/

mfahmirukman commented 1 year ago

Thank you @storymode7 :pray: