percy / cli

The Percy CLI is used to interact with, and upload snapshots to, percy.io via the command line.
https://docs.percy.io/docs/cli-overview
70 stars 43 forks source link

Can't retry when taking snapshot fails #1754

Open snps-halkoaho opened 2 hours ago

snps-halkoaho commented 2 hours ago

The problem

When snapshotting fails for whatever reason with message "Could not take DOM snapshot" printed from Percy, there doesn't seem to be any way to retry taking the snapshot.

Environment

Details

We have EPIPE errors now and then, and haven't been able to get rid of them, so we tried to implement our own retry logic. But there is no error thrown from the snapshot function and no return value.

The place where the error is printed from is here: https://github.com/percy/percy-testcafe/blob/2b3dd7a2f2fd0f4020c3ca67b3cc1272925520a8/index.js#L57 . As you can see, the error is caught and only printed to the log, but no information provided to the caller.

It would be nice to either have the retry functionality built in to Percy, or at least give some information to the caller when snapshotting fails. I'm aware of the retry setting in Percy configuration https://www.browserstack.com/docs/percy/take-percy-snapshots/overview, but it only applies to asset discover, and doesn't seem to do anything in my case.

I'm also aware of the documentation here: https://www.browserstack.com/docs/percy/troubleshoot/self-debug#smart-debugging-snapshots-missing-or-failed mentioning reducing concurrency could fix missing snapshot issues, and I'm giving it a try. But it would still be nice to be able to retry the snapshotting for all kinds of issues. Or just know in code that it failed.

Debug logs

14:17:26 [1] [percy:client] Uploading resources for 36796423... (297ms) 14:17:26 [1] [percy:client] Uploading 1.6MB resource: ... (3ms) 14:17:26 [1] [percy:client] Uploading 13.1kB resource: /percy.1728472645915.log... (0ms) 14:17:27 [1] [percy:client] Finalizing snapshot 2008921033... (975ms) 14:17:31 [1] [percy:testcafe] Could not take DOM snapshot "" 14:17:31 [1] [percy:testcafe] Error: write EPIPE 14:17:31 [1] at WriteWrap.onWriteComplete [as oncomplete] (node:internal/stream_base_commons:95:16) 14:17:31 [1] at WriteWrap.callbackTrampoline (node:internal/async_hooks:130:17)

ninadbstack commented 1 hour ago

@snps-halkoaho Thanks for the detailed report. The usual reason for a DOM snapshot to fail is mostly around conflicting js/js error that causes percy's DOM snapshot js script to fail execution. EPIPE might be related to conflicting JS or most likely related to some http request failure [ i.e. testcafe proxy failure or so ]

So most likely we will need to debug root cause here to determine whats failing.

That said we can certainly add an option that would throw exception instead of suppressing it. The original goal was to make sure that a snapshot failure should not break functional tests by throwing exception.

For next steps:

Let me know and we can take this forward.

snps-halkoaho commented 1 hour ago

@ninadbstack Thank you for the quick response!

Are you able to reproduce above with a small example

I'm not able to reproduce the error in any consistent manner. It happens for 1 snapshot (different one every time) in maybe 1 out of 4 percy runs. Each run has about 170 snapshots, so by quick math it happens about once in 680 snapshots. I wasn't really looking for a solution to the EPIPE error, just a way to get around it. Since it happens so rarely, I'm quite sure a retry would work. I also wouldn't discount random network glitches as the reason.

Can you move to latest version of CLI ? it has better logging which might help debugging the issue.

We are keeping up with updates, but I don't know if I'll have time to update it in the coming weeks. If I found some more information when we do update, I'll definitely keep you posted.

The original goal was to make sure that a snapshot failure should not break functional tests by throwing exception.

Not breaking the tests when snapshotting fails would definitely be a good goal. Returning something instead of throwing an error could be a less disruptive option. For example, returning a boolean that tells the caller whether the snapshotting succeeded or not.