Closed milahu closed 6 months ago
hmm don't bare images for example get embedded//parsen automatically into HTML? If I remember it correctly - inspecting a page such as http://httpbin.org/get shows some embedded text within html.
possible solution: capture all responses by default, see also #123 then
driver.response_bytes
could return the original response bytes
Yep, thought about that as well. However, using Fetch
for that then would interfere with users trying to implement request-interception themselves.
options I'd se here are:
Network.enable
for internally (passive) and advise using Fetch.enable
fur users. However, I'm not sure if they still could interfere. And also, it's deprecated:/ Fetch.enable
is actually per websocket, and not globally per TargetId
@milahu
Also, wouldn't Page.getResourceContent
be considerable here as well? Why not use this one?
Page.getResourceContent
yes : )
# $ python3 -m asyncio
from selenium_driverless import webdriver
from selenium_driverless.types.by import By
driver = await webdriver.Chrome()
url = "http://httpbin.org/get"
await driver.get(url)
target = await driver.current_target
frame_id = target.id
args = { "frameId": frame_id, "url": url, }
res = await target.execute_cdp_cmd("Page.getResourceContent", args)
res["content"]
# '{\n "args": {}, \n ........ \n "url": "http://httpbin.org/get"\n}\n'
Could not find node with given id
not sure where that error came from. in repl, it just works
# $ python3 -m asyncio
from selenium_driverless import webdriver
from selenium_driverless.types.by import By
driver = await webdriver.Chrome()
url = "http://httpbin.org/get"
await driver.get(url)
await driver.page_source
# '<html><head><meta name="color-scheme" content="light dark"></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{\n "args": {}, \n ........ \n "url": "http://httpbin.org/get"\n}\n</pre></body></html>'
elem = await driver.find_element(By.XPATH, "/html/body/pre")
await elem.text
# '{\n "args": {}, \n ........ \n "url": "http://httpbin.org/get"\n}\n'
Fetch.enable
yes, see https://github.com/kaliiiiiiiiii/Selenium-Driverless/issues/123#issuecomment-1890393341
driver.page_source
fails on non-html pages like http://httpbin.org/getobviously, DOM.getOuterHTML worky only on html pages
different from #127
javascript to the rescue...
possible solution: Page.FrameResource should give the mime type
possible solution:
document.body.innerText
gives the text of plain text pagesdocument.body.firstChild.tagName == "PRE"
different mimetypes will need different solutions
image/jpeg
,image/png
,image/*
: Javascript: how to get image as bytes from a page (without redownloading)application/pdf
: ?video/mp4
,video/*
: ?possible solution: capture all responses by default, see also #123 then
driver.response_bytes
could return the original response bytes