seleniumbase / SeleniumBase

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
https://seleniumbase.io
MIT License
4.46k stars 910 forks source link

Encountered InvalidSessionIdException occasionally in local docker run #2694

Closed flairekq closed 2 months ago

flairekq commented 2 months ago

Hi, I've tried to test seleniumbase with docker locally. It builds successfully and runs however in some runs there would be selenium.common.exceptions.InvalidSessionIdException: Message: invalid session id error for

sb.driver.uc_open_with_reconnect(https://www.thaiticketmajor.com/all-event/, 4)

Traceback:

Traceback (most recent call last):
  File "//test.py", line 43, in <module>
    main()
  File "//test.py", line 20, in main
    sb.driver.uc_open_with_reconnect(https://www.thaiticketmajor.com/all-event/, 4)
  File "/usr/local/lib/python3.10/dist-packages/seleniumbase/core/browser_launcher.py", line 3753, in <lambda>
connecting to url
    lambda *args, **kwargs: uc_open_with_reconnect(
  File "/usr/local/lib/python3.10/dist-packages/seleniumbase/core/browser_launcher.py", line 447, in uc_open_with_reconnect
    driver.switch_to.window(driver.window_handles[-1])
  File "/usr/local/lib/python3.10/dist-packages/seleniumbase/undetected/__init__.py", line 326, in __getattribute__
    return super().__getattribute__(item)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 494, in window_handles
    return self.execute(Command.W3C_GET_WINDOW_HANDLES)["value"]
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 347, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSessionIdException: Message: invalid session id
Stacktrace:
#0 0x56520bf7e863 <unknown>
#1 0x56520bc74717 <unknown>
#2 0x56520bcb18fd <unknown>
#3 0x56520bce1484 <unknown>
#4 0x56520bcdc123 <unknown>
#5 0x56520bcdb2bc <unknown>
#6 0x56520bc41ae8 <unknown>
#7 0x56520bf4284b <unknown>
#8 0x56520bf467a5 <unknown>
#9 0x56520bf30571 <unknown>
#10 0x56520bf47332 <unknown>
#11 0x56520bf1587f <unknown>
#12 0x56520bc400ee <unknown>
#13 0x7fa9366acd90 <unknown> 

dockerfile (used the latest one committed with changes at Set up SeleniumBase, and entrypoint sections):

# SeleniumBase Docker Image
FROM ubuntu:22.04

#============================
# Install Linux Dependencies
#============================
RUN apt-get update && apt-get install -y \
    fonts-liberation \
    libasound2 \
    libatk-bridge2.0-0 \
    libatk1.0-0 \
    libatspi2.0-0 \
    libcups2 \
    libdbus-1-3 \
    libdrm2 \
    libgbm1 \
    libgtk-3-0 \
    libnspr4 \
    libnss3 \
    libwayland-client0 \
    libxcomposite1 \
    libxdamage1 \
    libxfixes3 \
    libxkbcommon0 \
    libxrandr2 \
    xdg-utils \
    libu2f-udev \
    libvulkan1

#=================================
# Install Bash Command Line Tools
#=================================
RUN apt-get -qy --no-install-recommends install \
    sudo \
    unzip \
    wget \
    curl \
    vim \
    xvfb \
  && rm -rf /var/lib/apt/lists/*

#================
# Install Chrome
#================
RUN curl -LO  https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN apt-get install -y ./google-chrome-stable_current_amd64.deb
RUN rm google-chrome-stable_current_amd64.deb

#=======================================
# Install Python and Basic Python Tools
#=======================================
RUN apt-get -o Acquire::Check-Valid-Until=false -o Acquire::Check-Date=false update
RUN apt-get install -y python3 python3-pip python3-setuptools python3-dev python-distribute
RUN alias python=python3
RUN echo "alias python=python3" >> ~/.bashrc

#===========================
# Configure Virtual Display
#===========================
RUN set -e
RUN echo "Starting X virtual framebuffer (Xvfb) in background..."
RUN Xvfb -ac :99 -screen 0 1280x1024x16 > /dev/null 2>&1 &
RUN export DISPLAY=:99
RUN exec "$@"

#=======================
# Update Python Version
#=======================
RUN apt-get update -y
RUN apt-get -qy --no-install-recommends install python3.10
RUN rm /usr/bin/python3
RUN ln -s python3.10 /usr/bin/python3

#=============================================
# Allow Special Characters in Python Programs
#=============================================
RUN export PYTHONIOENCODING=utf8
RUN echo "export PYTHONIOENCODING=utf8" >> ~/.bashrc

#=====================
# Set up SeleniumBase
#=====================
COPY requirements.txt ./requirements.txt
RUN pip3 install --upgrade pip setuptools wheel
RUN pip3 install -r requirements.txt --upgrade

#=======================
# Download chromedriver
#=======================
RUN sbase get chromedriver --path

#==========================================
# Create entrypoint and grab example tests
#==========================================
COPY test.py ./test.py
ENTRYPOINT ["python3"]
CMD ["test.py"]

Thank you in advance.

mdmintz commented 2 months ago

URLs are Python strings, not variables. You need to put quotes around them. You probably had a loop with a try/except, which masked your actual issue and made the browser close. InvalidSessionIdException would occur if you ran Selenium commands after the browser window was already closed.

flairekq commented 2 months ago

Oh yes sorry I had a typo when I edited my code in the comment here.

My actual codes are:

from seleniumbase import SB

base_url = "https://www.thaiticketmajor.com/"
first_site_to_hit = "all-event/"

def is_pass_cloudflare(sb):
    try:
        sb.driver.assert_element("div#success")
        return True
    except:
        print("[verify_cloudflare_success] failed")
        return False

def main():
    with SB(uc=True) as sb:
        print("connecting to url")
        sb.driver.uc_open_with_reconnect(base_url + first_site_to_hit, 4)
        print("connected")

        sb.driver.uc_click("button.btn-signin", 4)

        with sb.frame_switch('iframe[src*="challenge"]'):
            is_pass = is_pass_cloudflare(sb)
            if not is_pass:
                # try to manually click and test whether pass
                sb.driver.uc_click("span.mark", 4)
                print("manually clicked cloudflare")
                sb.sleep(2)
                with sb.frame_switch('iframe[src*="challenge"]'):
                    is_pass = is_pass_cloudflare(sb)
                    if not is_pass:
                        print("failed cloudflare")
                        return 
            print("passed cloudflare")

main()
mdmintz commented 2 months ago

Try with some of the non UC Mode examples. Running from Docker will likely expose that automation is being used, which will get you detected.

flairekq commented 2 months ago

Hi @mdmintz, I managed to run my UC mode codes (same code above) on my local docker container on Windows. However, when I tried running the container on my m2 Mac, it hung at code related to UC like uc_click (apart from uc_open_with_reconnect). I can't change my code to non-UC mode as it fails Cloudflare when I use non-UC mode. Do you have any advice for this?

I referred to ([v4.25.4] Readme) and I have enabled rosetta on the Docker desktop.

mdmintz commented 2 months ago

I would run UC Mode outside of Docker, as Docker adds things that make bots detectable, and it's not easy to cover it up.

evan1108 commented 2 months ago

@flairekq I'm facing the same issues with my dockerized app where I'm using the SB context manager undetectable and headless. Running the container locally it hangs at uc_open_with_reconnect. Even when I just run my code from my local machine (i.e. not from the container) I'm seeing the app crash with the selenium.common.exceptions.NoSuchWindowException: Message: Active window was already closed! error.

I'm running my scraping code from a django-rq job (including when I run it locally), so maybe this is easy for sites to detect automation? Though I'm just assuming that it's the target site that is causing the app to crash and for me to lose my seleniumbase connection. I don't see anything different on the webpage.

Seems spotty - some runs get farther than others.