Closed milahu closed 6 months ago
but download_will_begin is never called
thanks for testing
await driver.get("chrome://settings/downloads") # click on #changeDownloadsPath # enter new downloads folder # click OK
No need fo hackish stuff - there's Browser.setDownloadBehavior
to set the path. Would however prefer a static download path.
another possible solution: run chromium in verbose mode and parse stdout log messages
That causes issues as subprocess.PIPE
can make chrome hang when getting a lot of messages. see https://github.com/electron/electron/issues/35399#issuecomment-1751523513
And yeah - ofc we'd run a thread to processs the pipe stdout. However, I'm not sure about timing etc. as well.
chrome://downloads/
uh yeah, likely would even work with passing JS events. However, I'm not sure if that's gonna work with contexts (incognito) and ofc, we preferably don't want a open tab all the time.
all of those approaches seem a bit "hackish" to me. More convincing to me are here passive request events as discussed at https://github.com/kaliiiiiiiiii/Selenium-Driverless/issues/148#issuecomment-1889808446 as well.
Also: Possibly, the 20s timeout at https://github.com/kaliiiiiiiiii/Selenium-Driverless/blob/24a3513305f833fac600ec8e31bcd5e9df955162/src/selenium_driverless/types/target.py#L228 should not be hardcoded. (assumption: CMD returns when network request is done, relevant for large file downloads)
download_will_begin
is never called
1. we must enable download events with Browser.setDownloadBehavior
eventsEnabled - boolean - Whether to emit download events (defaults to false).
2. we must send these CDP commands to driver.base_target
not to driver
and not to driver.current_target
then the Browser.downloadWillBegin
event is fired
passive request events
file downloads are not available to Network.getResponseBody
- it just hangs forever
i could parse the original filename from the content-disposition
header
of the Network.responseReceived
event, but i dont get the actual downloaded file path
in my aiohttp_chromium im using behavior = "allowAndName"
to use the download's guid as basename in the downloads folder
to avoid filename collisions, and to get stable filepaths
base_target = await driver.base_target
args = dict(
behavior = "allowAndName", # use guid as basename
downloadPath = self._downloads_path,
eventsEnabled = True, # enable download events
)
await base_target.execute_cdp_cmd("Browser.setDownloadBehavior", args)
await base_target.add_cdp_listener("Browser.downloadWillBegin", downloadWillBegin)
await base_target.add_cdp_listener("Browser.downloadProgress", downloadProgress)
in the downloadWillBegin
event, i get the filename as args["suggestedFilename"]
so this is not too complex, but requires setting up some event listeners which could interfere with listeners added by the user so maybe this is out of scope for Selenium-Driverless
- we must enable download events with Browser.setDownloadBehavior
Ah yep that's good to know, thanks.
so this is not too complex, but requires setting up some event listeners which could interfere with listeners added by the user so maybe this is out of scope for Selenium-Driverless
Actually, here I thought about having 2 sockets for each target. One for users and one for internals. This won't resolve all interfaces (such as network interception), however, event subscriptions are isolated to me knowledge.
i suggest to close this as "wontfix"
i assume that the original selenium
has the same limitation
and selenium_driverless
should be a drop-in replacement for selenium
(plus async
)
and this is too complex to solve in selenium
instead, consumers should wrap the driver.get
call
parse response headers, handle streams, handle CDP events, etc
(this is why i started aiohttp_chromium)
and selenium_driverless should be a drop-in replacement for selenium (plus async)
Actually, I plan to move away from selenium. Selenium it's syntax, not allowing simultanous targets, contexts, missing & unconventional structure just is a pain. That's why I'm currently working on addin some documentation (finally).
and this is too complex to solve in selenium instead, consumers should wrap the driver.get call parse response headers, handle streams, handle CDP events, etc
Actually, I was thinking about fixing:
Possibly, the 20s timeout at
Selenium-Driverless/src/selenium_driverless/types/target.py
Line 228 in 24a3513 in get
asyncio.create_task(self.execute_cdp_cmd("Page.navigate", args, timeout=20))
should not be hardcoded. (assumption: CMD returns when network request is done, relevant for large file downloads)
for sure, then adding an option at ChromeOptions
for the download behaviour and wait for first_completed at Browser.downloadWillBegin
or loadEventFired
So maybe let's keep this issue up for now such that I don't forget about:)
related:
before calling Network.getResponseBody
we should wait for the Network.loadingFinished
event
otherwise Network.getResponseBody
can return
only the first 65536 bytes (64KiB) of the response
see also https://github.com/ChromeDevTools/devtools-protocol/issues/44 https://github.com/milahu/aiohttp_chromium/commit/23d47d9142528df90b9b8e5cb8fb2470ea8e82f4
this is the problem with CDP events: the user may want to handle CDP events and then handling these events in selenium_driverless is "too much"
the user will want to handle both events:
Network.loadingFinished
for inline content
Browser.downloadWillBegin
for file downloads
...or do you want to handle both events in selenium_driverless?
related:
before calling
Network.getResponseBody
we should wait for theNetwork.loadingFinished
event otherwiseNetwork.getResponseBody
can return only the first 65536 bytes (64KiB) of the responsesee also ChromeDevTools/devtools-protocol#44 milahu/aiohttp_chromium@0fbd29c
this is the problem with CDP events: the user may want to handle CDP events and then handling these events in selenium_driverless is "too much"
=> multiple WS connections to a target considerable.
the user will want to handle both events:
Network.loadingFinished
for inline contentBrowser.downloadWillBegin
for file downloads...or do you want to handle both events in selenium_driverless?
hm doesn't Page.navigate
return after the full response has been received? At least for big file downloads, I noticed that it takes longer
doesn't
Page.navigate
return after the full response has been received?
aah, probably yes
i observed the "65536 bytes issue" when fetching assets (child requests)
when i open some url with
driver.get(url)
and the server returns a file download (Content-Disposition: attachment) thendriver.get
hangs and throws aTimeoutError
as a workaround, im checking the
~/Downloads
folder and wait until the file was writtenthe response can be either a file download, or an html error page, so
driver.get(url, wait_load=False)
would be wrongCDP has the experimental Browser.downloadWillBegin
but
download_will_begin
is never calledanother possible solution: use a different downloads folder for each download
another possible solution: run chromium in verbose mode and parse stdout log messages
another possible solution: monitor the
chrome://downloads/
page which has source urls and destination filepaths the page is based onchrome://downloads/downloads.js
see also How best to test file download links using Selenium see also Selenium Webdriver with Shadow DOM