feat: support shared browser install for concurrent `getBinary()` calls

lowlighter commented 9 months ago

This improves getBinary() so if it's called multiples times for the same browser version, the install process will occur exactly once

lino-levan commented 9 months ago

I think it would be great if you were able to enable parallel tests in this PR!

lino-levan commented 9 months ago

I suspect the issue is that --parallel runs tests in different processes. You'd somehow need to create a file lock to tell other processes to "just wait until the binary is downloaded".

lino-levan commented 9 months ago

More hanging tests, unfortunate.

lowlighter commented 9 months ago

I think I understand what's happening, it's not possible to enable parallel currently because all the tests spawn the browsers on same port 9222, so all the tests conflicts together 😅

lino-levan commented 9 months ago

Oh that's definitely possible, let me see if I can make a PR that dynamically adapts the port.

lowlighter commented 9 months ago

Oh that's definitely possible, let me see if I can make a PR that dynamically adapts the port.

I think the way to do is it to let chrome choose the remote debugging port automatically, that's how puppeteer does it: https://github.com/puppeteer/puppeteer/blob/1a734257f203c5f73744a9f06a1d590aff9f8ac8/packages/browsers/src/launch.ts#L140C21-L140C21

export const CDP_WEBSOCKET_ENDPOINT_REGEX =
  /^DevTools listening on (ws:\/\/.*)$/;

lino-levan commented 9 months ago

I think the way to do is it to let chrome choose the remote debugging port automatically

I did this originally, but the issue is that firefox's debug protocol prints a different message. Technically, there are other issues with FF support atm so maybe we just don't care about that for now and come back to it later. FF should be in 0.4.0 if all goes well.

lino-levan commented 9 months ago

Maybe it doesn't? I don't remember why I stopped doing that. Weird. I'll just do that. Give me ~5 minutes.

lino-levan commented 9 months ago

Still working on it. Turns out this is a slightly bigger refactor than I would have expected 😅

lino-levan commented 9 months ago

Let's see: https://github.com/lino-levan/astral/pull/26/files

lino-levan commented 9 months ago

@lowlighter merged my PR. Could you try to update your branch?

lowlighter commented 9 months ago

@lowlighter merged my PR. Could you try to update your branch?

I'll rebase it 👍 Still making some changes on it

lowlighter commented 9 months ago

It seems to works on Linux now (don't know for mac os since it gets canceled before the result) Still needs to check for windows, not sure why the executable isn't able to boot somewhere along the tests

lino-levan commented 9 months ago

not sure why the executable isn't able to boot somewhere along the tests

This usually means that there was an error running the executable and it usually prints out a log. No idea why there isn't a log or why it's failing to boot. Strange. I can test your branch on MacOS right now.

lino-levan commented 9 months ago

All tests pass on my Mac.

lowlighter commented 9 months ago

It seems I can reproduce on Windows, not sure if it's separate errors, but it may be possible that when spawning the browser there is an exclusive lock that get attached to the DLL so it's not possible to create another instance of it ?

Wait for selector => ./tests/wait_test.ts:5:6
error: Error: The process cannot access the file because it is being used by another process. (os error 32): open '******\.astral\118.0.5943.0\chrome-win64\chrome.dll'
      const file = await Deno.open(path, {
                   ^
    at async Object.open (ext:deno_fs/30_fs.js:574:15)
    at async decompressArchive (file:///******/astral/src/cache.ts:111:20)
    at async _getBinary (file:///******/astral/src/cache.ts:238:5)
    at async launch (file:///******/astral/src/browser.ts:166:12)
    at async file:///******/astral/tests/wait_test.ts:7:19

Wait for function => ./tests/wait_test.ts:20:6
error: Error: Your binary refused to boot
    throw new Error("Your binary refused to boot");
          ^
    at runCommand (file:///******/astral/src/browser.ts:50:11)
    at eventLoopTick (ext:core/01_core.js:183:11)
    at async launch (file:///******/astral/src/browser.ts:220:33)
    at async file:///******/astral/tests/wait_test.ts:22:19

I think I'll check later, thanks for helping and for implementing #26

lino-levan commented 9 months ago

Sounds good. Let me know if you want me to investigate further (though I'd have to obtain a windows computer!).

lowlighter commented 9 months ago

I have experimented some last stuff before sleeping, ssems that on Windows for some reason the exit code 21 gets triggered when ran in parallel really quickly

It seems to be a "NETWORK_CHANGED" error: https://source.chromium.org/chromium/chromium/src/+/main:net/base/net_error_list.h;l=90-91 I wonder if it's because there is special handling for the bindings that occurs on windows to reuse the same process 🤔 ? One thing to note is that it doesn't occur with --headless=old (though even if the tests passed it hangs on local machine)

The dirty fix I put needs to be removed, it was just for testing the CI.

It seems that all tests are "passing", but there are Web socket leaks on windows, I'm not sure where it does occurs. I thought it was maybe in page but apparently it had no effects

lowlighter commented 9 months ago

@lino-levan Ok I think I've mostly achieved to make everything work, but since this PR got quite huge I think I should split the different changes into smaller PR to make changes more trackable/reversible and clean some stuff so it's easier for you to review

I think there was a potential op leak in celestial bindings related to Websockets While Websocket.close() starts closing the websocket, it may actually not be closed instantly (I've confirmed by checking the ws.readyState after and before, and sometimes you may have ws.CLOSING state rather than Websocket.CLOSED. Looking at other people with same issue https://github.com/denoland/deno/issues/12815#issuecomment-974057915 it looks like it's intended that you look that the websocket has been closed yourself.

So maybe splitting this PR into the following ?

Fix potential websocket op leaks when closing
Fix Chrome SingletonLock crash if already deleted when trying to remove it
Fix astral quiet debug not enabled by default
Support different cache directory for installation
Lock file and singleton install for getBinary() + parallel testing

lino-levan commented 9 months ago

Splitting the PRs would be great. Reviewing this (and reverting these changes if need be, as you mentioned), would definitely be a challenge. I appreciate the work!

lino-levan commented 9 months ago

Unless you really want to split this PR, outside of my one nit, this LGTM.

lowlighter commented 9 months ago

I'll split the PRs if you don't mind 👍 The lock file shouldn't be combined with the combined cache features since they're not entirely related, especially since I think that one day we may have Deno.flock() not in unstable anymore so it'll be way easier to perform the cleanup I think it's better for tracking changes anyways

lino-levan commented 9 months ago

Did the three PRs cover this one? Is there more that this PR does?

Edit: I guess the locking logic isn't implemented?

lowlighter commented 9 months ago

Did the three PRs cover this one? Is there more that this PR does?

Edit: I guess the locking logic isn't implemented?

Yeah it's still missing the lock feature, I'll rebase shortly, but I have a design question for you though

This pr actually has 2 "lock mechanisms", one promise based and the other lock file based

The promise one is intended to limit the download of a same browser to a single one within the same process (calling launch() multiple times would just return the previous promise created, instead of starting a new install). Using promises has the advantage of not requiring any permissions since it's in memory

The lock file has been added because for parallelism in different process, it's not possible to use the promise mechanism. Technically the lock file also achieve the same as the promise based so we could use only a lock file, the only thing is that it requires write permissions (but technically I think the user would already have given write access to the astral cache)

Should I just keep the lock file then ?

lino-levan commented 9 months ago

I think just the lock file makes sense to me.

lowlighter commented 9 months ago

@lino-levan Ok this is finally ready ! Thanks again for your patience, hope I didn't disturb too much with my PRs 🙇

lino-levan / astral

feat: support shared browser install for concurrent `getBinary()` calls #25