percy / percy-agent

[Deprecated in favor of `@percy/cli`] An agent process for integrating with Percy.
https://percy.io
MIT License
22 stars 23 forks source link

`percy start --detached` errors due to race condition between process spawning and health check #580

Closed nathanshelly closed 3 years ago

nathanshelly commented 3 years ago

πŸ‘‹ First off thanks for your product! Recently introduced it at my company and have had nothing but positive feedback so far.

I noticed an odd race condition from percy start --detached between spawning the agent and running the health check. This causes the command to error but still start the process in the background. It only occurs with the --detached flag.

I'm trying to start the agent from node script using execa.command. Unfortunately this error causes execa to bail which kills the agent.

I investigated a naive fix which seems to have worked around the problem though definitely not in a clean way.

current behavior

Note: The PERCY_TOKEN in the following examples was created to be purposefully "leaked" here

Here is the current behavior:

without_sleep

Notice how the health check fails and I can see that nothing is running on the port immediately after it finishes but a second or two later it shows up in lsof's output. If I use a fake PERCY_TOKEN (e.g. PERCY_TOKEN=foo yarn percy start --detached) I get the same health check error but the process never starts (since there is no valid project to create a build for).

To reproduce this behavior simply run the following in a new directory:

> yarn init
> yarn add @percy/script
> PERCY_TOKEN=<fill-in-your-token-here> yarn percy start --detached

naive fix

I tested a naive fix by inserting a simple 2 second sleep between the call to run Percy detached and the failing health check. Basically I inserted the following on L60 of that same file:

function sleep(ms) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

await sleep(2000);

This avoids the race condition:

with_sleep

Note that when spawning via execa 2000ms isn't enough. Bumping up to 5000ms seemed to consistently succeed

I also tried awaiting the spawn call that I think is the culprit here but that didn't seem to have any effect.

wwilsman commented 3 years ago

Hey @nathanshelly!

The health check done by start is supposed to retry for a little rather than immediately fail, so I wonder what is causing it not to. πŸ€”

In a new upcoming Percy CLI tool, the order of how the process is started has changed which should avoid this issue entirely. Also, you can use the new core package directly in your script instead of having to spawn a subprocess. However, only a subset of our SDKs have been updated to take advantage of the new tool. You can see which SDKs have betas available in this issue β€” https://github.com/percy/cli/issues/57, and you can find new install steps in the respective SDK's readme.

If the SDK you're using doesn't have a beta yet, you might still be able to work around this in agent by not using the --detached flag. The main benefit that flag provides is so you can later use percy stop to finalize and stop the process. Without the flag, you can still stop the process by killing it which will in turn finalize the build before exiting. However, you'll still have to wait a little until the local server is ready to accept snapshots. You can do that by hitting the http://localhost:5338/percy/healthcheck endpoint yourself, or by using percy healthcheck.

nathanshelly commented 3 years ago

Thanks for the quick reply @wwilsman!

Not sure what's going on with the health check either but the new @percy/cli package is a way better fit for my use case anyway. Thanks for pointing it out!

Using @percy/cli directly without any other SDK I'm running into a separate issue I'm hoping you can shed some light on (happy to open a new issue if you'd prefer to avoid mixing these questions). Attempting to snapshot https://opendoor.com/developers w/ percy.capture({ name: 'developers', url: 'https://opendoor.com/developers' }) I'm getting the following error:

[percy] Encountered an error for page: https://opendoor.com/developers
[percy] Error: Evaluation failed: TypeError: Cannot read property 'trim' of undefined
    at s (/Users/nathan/work/go/src/github.com/opendoor-labs/code/js/packages/awl/node_modules/@percy/dom/dist/index.js:1:5438)
    at /Users/nathan/work/go/src/github.com/opendoor-labs/code/js/packages/awl/node_modules/@percy/dom/dist/index.js:1:8337
    at Module.m (/Users/nathan/work/go/src/github.com/opendoor-labs/code/js/packages/awl/node_modules/@percy/dom/dist/index.js:1:8582)
    at __puppeteer_evaluation_script__:5:20

Tracking the error down I think it's occurring on this line in @percy/dom but I unfortunately don't have enough knowledge in these matters to understand why the ownerNode might be undefined here. Does this just need an additional optional chaining parameter (e.g. innerText?.trim) or is there something more complicated going on here?

wwilsman commented 3 years ago

It looks like that is a bug with the new @percy/dom. Just as you said, another optional chaining parameter is probably needed there. This would probably better fit as a new issue on the CLI repo, but I've already created a PR (https://github.com/percy/cli/pull/87) and will merge and release a new version shortly. πŸ‘

Update: Released v1.0.0-beta.24

nathanshelly commented 3 years ago

sorry for the delayed response here but that worked perfectly. Thank you so much for your help!