kaliiiiiiiiii / Selenium-Driverless

undetected Selenium without usage of chromedriver
https://kaliiiiiiiiii.github.io/Selenium-Driverless/
Other
487 stars 61 forks source link

driver.get throws TimeoutError on fragment-only URL changes #139

Closed milahu closed 7 months ago

milahu commented 7 months ago

consider this html page

<h1 id="one">one</h1>
<h2 id="two">two</h2>

when i load that page with driver.get("page.html") then everything is fine

then, when i try to load the #two section with driver.get("page.html#two") then driver.get hangs and throws a TimeoutError

maybe there is a better method to only change the fragment ID of the url (#two) but this should also work with driver.get

kaliiiiiiiiii commented 7 months ago

Notice here: I edited your description to be numbered into 2 issues.

  1. makes sense to me why it occurrs. Basically, the page load event doesn't get fired. It should be possible to cover this by comparing with the previous url.
  2. Is a bit more complicated. If a download get's dispatched, generally depends on the webpages mime-type response. However,. I wanna avoid tracking the requests if possible, as I suspect there to be some leaks. Another possibility is to track Page.downloadWillBegin. However, that event is flagged as deprectated, and therefore might not be reliable.

for now, I'm pretty sure that:

driver.get(url, wait_load=False)

should work just fine.

milahu commented 7 months ago

It should be possible to cover this by comparing with the previous url.

like this?

this also allows driver.get("#some-id") to focus a section on the current page

    async def get(self, url: str, referrer: str = None, wait_load: bool = True, timeout: float = 30) -> None:
        """Loads a web page in the current browser session."""
        if "#" in url:
            current_url_base = (await self.current_url).split("#")[0]
            if url[0] == "#":
                # allow to navigate only by fragment ID of the current url
                url = current_url_base + url
                print(f"appending fragement ID to current base url: {url}")
                wait_load = False
            elif url.split("#")[0] == current_url_base:
                # dont wait for fragement-only url change
                print(f"not waiting for fragement-only url change: {url}")
                wait_load = False
        await self.current_target.get(url=url, referrer=referrer, wait_load=wait_load, timeout=timeout)

    # monkey patch
    driver.get = get.__get__(driver)
kaliiiiiiiiii commented 7 months ago

It should be possible to cover this by comparing with the previous url.

like this?

this also allows driver.get("#some-id") to focus a section on the current page

    async def get(self, url: str, referrer: str = None, wait_load: bool = True, timeout: float = 30) -> None:
        """Loads a web page in the current browser session."""
        if "#" in url:
            current_url_base = (await self.current_url).split("#")[0]
            if url[0] == "#":
                # allow to navigate only by fragment ID of the current url
                url = current_url_base + url
                print(f"appending fragement ID to current base url: {url}")
                wait_load = False
            elif url.split("#")[0] == current_url_base:
                # dont wait for fragement-only url change
                print(f"not waiting for fragement-only url change: {url}")
                wait_load = False
        await self.current_target.get(url=url, referrer=referrer, wait_load=wait_load, timeout=timeout)

    # monkey patch
    driver.get = get.__get__(driver)

I suppose so, yes. looks good to me. Implementation however should be at class Target.get and not class Driver.get If you want, you can open an MR. Or elsewisise I just implement it myself.

milahu commented 7 months ago

just copy, paste, commit... all my work is public domain / MIT license