kaliiiiiiiiii / Selenium-Driverless

undetected Selenium without usage of chromedriver
https://kaliiiiiiiiii.github.io/Selenium-Driverless/
Other
412 stars 52 forks source link

get_target_for_iframe fails on file urls: NoSuchIframe: no target for iframe found #144

Closed milahu closed 5 months ago

milahu commented 6 months ago

low priority, because this affects only file urls

get_target_for_iframe works on http urls, where the iframe source is https://www.recaptcha.net/recaptcha/api2/anchor?...

but get_target_for_iframe fails on file urls

driver.targets has no target with target.type == "iframe" (or "frame")

maybe because cross-origin policy?

also driver.find_element(By.TAG_NAME, "iframe") fails to find the iframe element and all other versions of driver.find_element are working

Selenium-Driverless version 1.7.1

test-selenium-driverless.switch-to-iframe.py ```py #!/usr/bin/env python3 import asyncio import base64 import sys import os import time import datetime import traceback import shutil from selenium_driverless import webdriver from selenium_driverless.types.by import By from cdp_socket.exceptions import CDPError # TODO use data urls instead of tempfiles # use tmpfs in RAM to avoid disk writes tempdir = f"/run/user/{os.getuid()}" if not os.path.exists(tempdir): raise ValueError(f"tempdir does not exist: {tempdir}") def datetime_str(): # https://stackoverflow.com/questions/2150739/iso-time-iso-8601-in-python#28147286 return datetime.datetime.utcnow().strftime("%Y%m%dT%H%M%S.%fZ") tempdir += "/test-selenium-driverless" print(f"TODO: rm -rf {tempdir}.*") tempdir += f".{datetime_str()}" print(f"tempdir: {tempdir}") os.makedirs(tempdir) async def main(): options = webdriver.ChromeOptions() iframe_html_path = f"{tempdir}/iframe.html" with open(iframe_html_path, "w") as f: f.write( "

iframe

" ) test_html_path = f"{tempdir}/test.html" with open(test_html_path, "w") as f: f.write( "

test

" f"" ) async with webdriver.Chrome(options=options, max_ws_size=2 ** 30) as driver: url = "file://" + test_html_path print(f"url: {url}") await driver.get(url) # wait for page load await asyncio.sleep(1) # find searching h2 in iframe.html with iframe.content_document.find_element + By.CSS_SELECTOR found h2: WebElement("HTMLHeadingElement", obj_id=2979415939007923755.5.3, node_id="None", backend_node_id=10, context_id=5) searching h2 in iframe.html with iframe.content_document.find_element + By.TAG_NAME found h2: WebElement("None", obj_id=None, node_id="None", backend_node_id=10, context_id=None) searching iframe target with driver.get_target_for_iframe driver.get_target_for_iframe failed: no target for iframe found searching iframe target with driver.get_targets_for_iframes driver.get_targets_for_iframes failed: list index out of range searching iframe target with driver.targets driver.targets[0] 1 = 'DA88C91EB79C08AAD75E264FCABFEC56' driver.targets[0] 2 = driver.targets[0].type = 'service_worker' driver.targets[0].window_id -> {'code': -32000, 'message': 'No web contents in the target'} driver.targets[0]._parent_target = None driver.targets[1] 1 = '1AD1C7B5EFAD2F4AB445652C92804C0F' driver.targets[1] 2 = driver.targets[1].type = 'page' driver.targets[1].window_id = 2067196537 driver.targets[1]._parent_target = None iframe_target: None switching to iframe driver.switch_to.frame failed: no target for iframe found driver.switch_to.frame failed: no target for iframe found driver.switch_to.frame failed: no target for iframe found driver_url: file:///run/user/1000/test-selenium-driverless.20240110T095725.677589Z/test.html page_source:

test

searching h2 in iframe.html driver.find_element failed: switching to default content driver_url: file:///run/user/1000/test-selenium-driverless.20240110T095725.677589Z/test.html ```

not fixed by #68 #7 #9

kaliiiiiiiiii commented 6 months ago

uhh have you tried

iframes = await driver.find_elements(By.TAG_NAME, "iframe")
await asyncio.sleep(0.5)
iframe_document = await iframes[0].content_document
# iframe_document.find_elements(...)

? General rule here is: only cors iframes have a target due to OOPIF. see https://www.chromium.org/developers/design-documents/site-isolation/#project-tasks

https://github.com/kaliiiiiiiiii/Selenium-Driverless/blob/24a3513305f833fac600ec8e31bcd5e9df955162/src/selenium_driverless/types/webelement.py#L191-L226

milahu commented 6 months ago

driver.find_elements(By.TAG_NAME, "iframe")

By.TAG_NAME is the only driver.find_element variant that fails in my code

only cors iframes have a target due to OOPIF

yep, i kind-of expected that

ideally, the interface should be the same for all iframes so i dont need code like

if is_cors_iframe(iframe):
    # switch target
    old_target = ...
    await driver.switch_to.frame(iframe)
    elem = await driver.find_element(...)
    # switch back
    await driver.switch_to.frame(old_target)
else:
    elem = await iframe.content_document.find_element(...)

probably, these target-switches are expensive so a context handler would be nice

with driver.context_of.frame(iframe) as iframe_driver:
    elem = await iframe_driver.find_element(...)
kaliiiiiiiiii commented 5 months ago

driver.find_elements(By.TAG_NAME, "iframe")

By.TAG_NAME is the only driver.find_element variant that fails in my code

please specify "fails"? Does iframe.content_element work now btw?

ideally, the interface should be the same for all iframes so i dont need code like

if is_cors_iframe(iframe):
    # switch target
    old_target = ...
    await driver.switch_to.frame(iframe)
    elem = await driver.find_element(...)
    # switch back
    await driver.switch_to.frame(old_target)
else:
    elem = await iframe.content_document.find_element(...)

probably, these target-switches are expensive so a context handler would be nice

with driver.context_of.frame(iframe) as iframe_driver:
    elem = await iframe_driver.find_element(...)

well I don't like that switching thingy anyways. My long-term plan is to deprecate it anyways and move away from selenium. Ideall, in my oppinion. there should be a type class Frame which is a target can contain multiple of. That's gonna require a lot of time & refactoring to develop tho. Also, driver.switch_to.frame only supports Target as an argument. Forgot to remove//deprecate it after introducing .content_element

milahu commented 5 months ago

yes : )

iframe.content_document works for my simple test case with file urls and it also works for captcha iframes

iframe = await driver.find_element(By.CSS_SELECTOR, "iframe")
iframe_doc = await iframe.content_document
elem = await iframe_doc.find_element(By.CSS_SELECTOR, "h2")

driver.find_element(By.TAG_NAME, "iframe") fails because

-  WebElement("None", obj_id=None, node_id="None", backend_node_id=10, context_id=None)
+  WebElement("HTMLIFrameElement", obj_id=-975510912079378788.4.3, node_id="None", backend_node_id=8, context_id=4)