Closed lowlighter closed 8 months ago
I think it would be great if you were able to enable parallel tests in this PR!
I suspect the issue is that --parallel
runs tests in different processes. You'd somehow need to create a file lock to tell other processes to "just wait until the binary is downloaded".
More hanging tests, unfortunate.
I think I understand what's happening, it's not possible to enable parallel currently because all the tests spawn the browsers on same port 9222, so all the tests conflicts together 😅
Oh that's definitely possible, let me see if I can make a PR that dynamically adapts the port.
Oh that's definitely possible, let me see if I can make a PR that dynamically adapts the port.
I think the way to do is it to let chrome choose the remote debugging port automatically, that's how puppeteer does it: https://github.com/puppeteer/puppeteer/blob/1a734257f203c5f73744a9f06a1d590aff9f8ac8/packages/browsers/src/launch.ts#L140C21-L140C21
export const CDP_WEBSOCKET_ENDPOINT_REGEX =
/^DevTools listening on (ws:\/\/.*)$/;
I think the way to do is it to let chrome choose the remote debugging port automatically
I did this originally, but the issue is that firefox's debug protocol prints a different message. Technically, there are other issues with FF support atm so maybe we just don't care about that for now and come back to it later. FF should be in 0.4.0 if all goes well.
Maybe it doesn't? I don't remember why I stopped doing that. Weird. I'll just do that. Give me ~5 minutes.
Still working on it. Turns out this is a slightly bigger refactor than I would have expected 😅
@lowlighter merged my PR. Could you try to update your branch?
@lowlighter merged my PR. Could you try to update your branch?
I'll rebase it 👍 Still making some changes on it
It seems to works on Linux now (don't know for mac os since it gets canceled before the result) Still needs to check for windows, not sure why the executable isn't able to boot somewhere along the tests
not sure why the executable isn't able to boot somewhere along the tests
This usually means that there was an error running the executable and it usually prints out a log. No idea why there isn't a log or why it's failing to boot. Strange. I can test your branch on MacOS right now.
All tests pass on my Mac.
It seems I can reproduce on Windows, not sure if it's separate errors, but it may be possible that when spawning the browser there is an exclusive lock that get attached to the DLL so it's not possible to create another instance of it ?
Wait for selector => ./tests/wait_test.ts:5:6
error: Error: The process cannot access the file because it is being used by another process. (os error 32): open '******\.astral\118.0.5943.0\chrome-win64\chrome.dll'
const file = await Deno.open(path, {
^
at async Object.open (ext:deno_fs/30_fs.js:574:15)
at async decompressArchive (file:///******/astral/src/cache.ts:111:20)
at async _getBinary (file:///******/astral/src/cache.ts:238:5)
at async launch (file:///******/astral/src/browser.ts:166:12)
at async file:///******/astral/tests/wait_test.ts:7:19
Wait for function => ./tests/wait_test.ts:20:6
error: Error: Your binary refused to boot
throw new Error("Your binary refused to boot");
^
at runCommand (file:///******/astral/src/browser.ts:50:11)
at eventLoopTick (ext:core/01_core.js:183:11)
at async launch (file:///******/astral/src/browser.ts:220:33)
at async file:///******/astral/tests/wait_test.ts:22:19
I think I'll check later, thanks for helping and for implementing #26
Sounds good. Let me know if you want me to investigate further (though I'd have to obtain a windows computer!).
I have experimented some last stuff before sleeping, ssems that on Windows for some reason the exit code 21 gets triggered when ran in parallel really quickly
It seems to be a "NETWORK_CHANGED" error:
https://source.chromium.org/chromium/chromium/src/+/main:net/base/net_error_list.h;l=90-91
I wonder if it's because there is special handling for the bindings that occurs on windows to reuse the same process 🤔 ?
One thing to note is that it doesn't occur with --headless=old
(though even if the tests passed it hangs on local machine)
The dirty fix I put needs to be removed, it was just for testing the CI.
It seems that all tests are "passing", but there are Web socket leaks on windows, I'm not sure where it does occurs. I thought it was maybe in page but apparently it had no effects
@lino-levan Ok I think I've mostly achieved to make everything work, but since this PR got quite huge I think I should split the different changes into smaller PR to make changes more trackable/reversible and clean some stuff so it's easier for you to review
I think there was a potential op leak in celestial bindings related to Websockets
While Websocket.close()
starts closing the websocket, it may actually not be closed instantly (I've confirmed by checking the ws.readyState
after and before, and sometimes you may have ws.CLOSING
state rather than Websocket.CLOSED
.
Looking at other people with same issue https://github.com/denoland/deno/issues/12815#issuecomment-974057915 it looks like it's intended that you look that the websocket has been closed yourself.
So maybe splitting this PR into the following ?
getBinary()
+ parallel testing Splitting the PRs would be great. Reviewing this (and reverting these changes if need be, as you mentioned), would definitely be a challenge. I appreciate the work!
Unless you really want to split this PR, outside of my one nit, this LGTM.
I'll split the PRs if you don't mind 👍
The lock file shouldn't be combined with the combined cache features since they're not entirely related, especially since I think that one day we may have Deno.flock()
not in unstable anymore so it'll be way easier to perform the cleanup
I think it's better for tracking changes anyways
Did the three PRs cover this one? Is there more that this PR does?
Edit: I guess the locking logic isn't implemented?
Did the three PRs cover this one? Is there more that this PR does?
Edit: I guess the locking logic isn't implemented?
Yeah it's still missing the lock feature, I'll rebase shortly, but I have a design question for you though
This pr actually has 2 "lock mechanisms", one promise based and the other lock file based
The promise one is intended to limit the download of a same browser to a single one within the same process (calling launch()
multiple times would just return the previous promise created, instead of starting a new install). Using promises has the advantage of not requiring any permissions since it's in memory
The lock file has been added because for parallelism in different process, it's not possible to use the promise mechanism. Technically the lock file also achieve the same as the promise based so we could use only a lock file, the only thing is that it requires write permissions (but technically I think the user would already have given write access to the astral cache)
Should I just keep the lock file then ?
I think just the lock file makes sense to me.
@lino-levan Ok this is finally ready ! Thanks again for your patience, hope I didn't disturb too much with my PRs 🙇
This improves
getBinary()
so if it's called multiples times for the same browser version, the install process will occur exactly once