omkarcloud / botasaurus

The All in One Framework to build Awesome Scrapers.
https://www.omkar.cloud/botasaurus/
MIT License
1.36k stars 124 forks source link

Chrome does not `quit()` properly #60

Closed dnguyenSL closed 7 months ago

dnguyenSL commented 7 months ago

Hi omkarcloud, I really appreciate your project in solving CloudFlare detection.

Context

Do you have any suggestion for me?

Environment 1:

OS: MacOS Proxy: datacenter IPs. The library closed the browsers but not the instances.

Screenshot 2024-02-07 at 5 21 41 PM

Environment 2:

OS: Docker on Mac Dockerfile: Based on this botasaurus-starter. Proxy: datacenter IPs. Memory Usage: keeps going up after requests.

My code:

from botasaurus import *
from typing import List
from botasaurus.create_stealth_driver import create_stealth_driver
import json

from pydantic import BaseModel

from close import close_chrome

class CookieResponse(BaseModel):
    heading: str
    cookies: List[dict]
    chromeOptions: dict
    remoteAddress: str

def get_proxy(data):
    return data["proxy"]

class Input(BaseModel):
    proxy: str
    url: str | None = "https://www.instacart.com/"

# I have web APIs to called this function
def scape_cookies(input: Input) -> CookieResponse: 
    pid = None
    @browser(
        create_driver=create_stealth_driver(
            start_url=input["url"],
        ),
        max_retry=3,
        proxy=input["proxy"],
    )
    def scrape_website_args(driver: AntiDetectDriver, data) -> CookieResponse:
        heading = driver.text('h1')
        cookies = driver.get_cookies()
        serialized_data = json.dumps(cookies)
        nonlocal pid
        pid = driver.service.process.pid
        # I tried this three functions but it doesn't work.
        # driver.service.process.kill()
        # driver.service.process.terminate()
        # driver.quit()

        return {
            "heading": heading,
            "cookies": cookies,
            "chromeOptions": driver.capabilities['goog:chromeOptions'],
        }

    response = scrape_website_args(input)
    print(response)
    # scrape_website_args.close() => this also does not work even with reuse_driver=True and keep_driver_alive=True
    return response

if __name__ == "__main__":
    response = scape_cookies(
        {
            "proxy": "proxy here",
            "url": "https://www.instacart.com/",
        }
    )
Chetan11-dev commented 7 months ago

Does using this function works

import subprocess
import platform

def kill_process_by_pid(pid):
    if pid is None:
        raise ValueError("A PID must be provided")

    os_name = platform.system()
    try:
        if os_name == 'Windows':
            subprocess.run(['taskkill', '/PID', str(pid), '/F'], check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
        else:  # macOS and Linux share the same command for killing a process by PID
            subprocess.run(['kill', str(pid)], check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    except subprocess.CalledProcessError:
        print(f"Process with PID {pid} could not be terminated or does not exist.")
    except Exception as e:
        print(f"An error occurred while trying to kill the process with PID {pid}: {e}")

# Example usage:
# kill_process_by_pid(driver.service.process.pid)  # Replace 12345 with the actual PID you want to kill

Also, let me know is the issue due to stealth driver only, or normal driver works fine?

VincentDoreau13 commented 7 months ago

Good morning, I encountered the same issue. The issue stems from the stealth driver, which initiates a Chrome process but fails to close it properly.

image

I resolved this issue locally by ensuring the termination of the Chrome process at the end of each iteration with the kill_process_by_pid() method, simultaneously with the execution of the driver's quit() method.

To implement this solution, I modified the do_create_stealth_driver() method in create_stealth_driver.py file to record the PID as follows: remote_driver.chrome_driver = chrome.pid

Chetan11-dev commented 7 months ago

@VincentDoreau13 and @dnguyenSL, I have released a bug fix, Kindly run

python -m pip install botasaurus --upgrade

And @VincentDoreau13 could you let me know if this fix resolves the issue?

VincentDoreau13 commented 7 months ago

It's good for me, the problem is solved ! Thanks !