wix / Detox

Gray box end-to-end testing and automation framework for mobile apps
https://wix.github.io/Detox/
MIT License
11.17k stars 1.92k forks source link

iOS Detox tests fail in Github actions CI with multiple workers - Exceeded timeout of 300000ms while handling jest-circus "setup" event #3574

Closed zacharyweidenbach closed 1 year ago

zacharyweidenbach commented 2 years ago

What happened?

First reported with a comment on a separate issue: https://github.com/wix/Detox/issues/3342#issuecomment-1228850607

Running tests for iOS with 3 workers using the detox cli --workers leads to intermittent test failures. It seems to have something to do with the instantiation of new simulators to support the concurrent workers. When running only 1 worker, I am unable to reproduce this issue.

CI: Github Actions

As a workaround, I have created a simple test suite consisting of only 3 tests. These tests run with 3 workers so that Detox will create new simulator instances for them. This 3 tests usually fail with the timeout issue, however, when the actual test suite runs immediately after, it does not encounter the timeout issue. This leads me to believe that if the simulators have already been created, Detox bypasses this step, and bypasses the bug altogether.

Attached are the logs with trace logging enabled from when the errors occurred.

The command to run this step is as follows

 - name: Initialize Detox Tests
        id: detox-init
        continue-on-error: true
        run: yarn detox test --use-custom-logger false --loglevel warn --configuration ios.sim.gha --cleanup --workers 3 --runner-config e2e/InitCITests/config.json --keepLockFile true

What was the expected behaviour?

The jest-circus setup step would not timeout.

Was it tested on latest Detox?

Did your test throw out a timeout?

Help us reproduce this issue!

No response

In what environment did this happen?

Detox version: 19.10.0 React Native version: 0.67.2 Node version: 14.18.1 Device model: iPhone 13 iOS version: 14.5 macOS version: 12 Xcode version: 13.4.1 Test-runner (select one): jest-circus

Detox logs

Detox logs [18_Initialize Detox Tests-1.txt](https://github.com/wix/Detox/files/9456312/18_Initialize.Detox.Tests-1.txt)

Device logs

Device logs ``` paste logs here! ```

More data, please!

No response

gabrieldonadel commented 2 years ago

I've been experiencing this exact same problem for a while now, the only solution I found was to only use one worker at a time

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe the issue is still relevant, please test on the latest Detox and report back.

Thank you for your contributions!

For more information on bots in this repository, read this discussion.

stale[bot] commented 1 year ago

The issue has been closed for inactivity.

gabrieldonadel commented 1 year ago

Can we reopen this @d4vidi ?

AdamTyler commented 1 year ago

I'm having this issue on GH Actions after updating the runner from macos-11 to macos-12.

It seems the simulator doesn't boot fast enough since its the first use of it. I have maxWorkers at 1. bootstatus is run but doesn't seem to report anything and eventually the timeout is fired and the tests don't get assigned to a simulator:

Oct 28 15:53:59 detox[45848] DEBUG: [EXEC_CMD, #2] /usr/bin/xcrun simctl boot 0336474D-27B1-4220-8EC9-DD82EDEBC404 
Oct 28 15:53:59 detox[45848] DEBUG: [EXEC_TRY, #2] Booting device 0336474D-27B1-4220-8EC9-DD82EDEBC404...
Oct 28 15:54:02 detox[45848] TRACE: [EXEC_SUCCESS, #2] 
Oct 28 15:54:02 detox[45848] DEBUG: [EXEC_CMD, #3] /usr/bin/xcrun simctl bootstatus 0336474D-27B1-4220-8EC9-DD82EDEBC404
Oct 28 15:55:58 detox[45848] ERROR: Exceeded timeout of 120000ms while handling jest-circus "setup" event
Oct 28 15:56:09 detox[45848] INFO:  Signup (Phone) is assigned to undefined
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe the issue is still relevant, please test on the latest Detox and report back.

Thank you for your contributions!

For more information on bots in this repository, read this discussion.

gabrieldonadel commented 1 year ago

This is still a problem

d4vidi commented 1 year ago

As far as I can tell from the logs, it appears that the simulators refuse to boot:

2022-08-30T17:24:37.6595290Z 17:24:35.471 detox[46305] TRACE: [EXEC_SUCCESS, #4] Monitoring boot status for iPhone 13-Detox (8B0919E4-BF8C-44C8-85C5-B327301D17F9).
2022-08-30T17:24:37.6698290Z [2022-08-30 17:17:04 +0000] Status=2, isTerminal=NO, Elapsed=03:18.
2022-08-30T17:24:37.6801620Z    Waiting on Data Migration
2022-08-30T17:24:37.6904190Z        Reason:Running plugin com.apple.managedconfiguration.migrator (MCProfile.migrator, user-agnostic)
2022-08-30T17:24:37.7011750Z        Migration Elapsed:00:00 seconds
...
...
...
2022-08-30T17:24:46.4004910Z        Reason:Running plugin com.apple.managedconfiguration.mdm.migrator (MCMDM.migrator)
2022-08-30T17:24:46.4105920Z        Migration Elapsed:03:49 seconds
2022-08-30T17:24:46.4206390Z 
2022-08-30T17:24:46.4307930Z [2022-08-30 17:24:35 +0000] Status=2, isTerminal=NO, Elapsed=07:10.

Even after 7 minutes, they are stuck in the Data Migration step. Here's what a normal boot up looks like:

Monitoring boot status for iPhone 13-Detox (8F9D3538-64FA-4DF5-9492-E2C093E15748).
[2022-11-28 10:36:46 +0000] Status=2, isTerminal=NO, Elapsed=00:12.
    Waiting on Data Migration
        Reason:Running plugin com.apple.-0LaunchServicesMigrator (00LaunchServicesMigrator.migrator, user-agnostic)
        Migration Elapsed:00:10 seconds
...
...
...
[2022-11-28 10:36:52 +0000] Status=2, isTerminal=NO, Elapsed=00:17.
    Waiting on Data Migration
        Reason:Running plugin com.apple.managedconfiguration.mdm.migrator (MCMDM.migrator)
        Migration Elapsed:00:15 seconds

[2022-11-28 10:36:53 +0000] Status=2, isTerminal=NO, Elapsed=00:18.
    Waiting on Data Migration
        Reason:Running plugin com.apple.PreferencesMigrator (PreferencesMigrator.migrator)
        Migration Elapsed:00:16 seconds

[2022-11-28 10:36:54 +0000] Status=4, isTerminal=NO, Elapsed=00:19.
    Waiting on System App

[2022-11-28 10:36:55 +0000] Status=4, isTerminal=NO, Elapsed=00:21.
    Waiting on System App

[2022-11-28 10:36:56 +0000] Status=4, isTerminal=NO, Elapsed=00:21.
    Waiting on System App

[2022-11-28 10:36:57 +0000] Status=4, isTerminal=NO, Elapsed=00:22.
    Waiting on System App

[2022-11-28 10:36:58 +0000] Status=4, isTerminal=NO, Elapsed=00:23.
    Waiting on System App

[2022-11-28 10:36:58 +0000] Status=4, isTerminal=NO, Elapsed=00:24.
    Waiting on System App

[2022-11-28 10:36:59 +0000] Status=4, isTerminal=NO, Elapsed=00:25.
    Waiting on System App

[2022-11-28 10:37:01 +0000] Status=4, isTerminal=NO, Elapsed=00:26.
    Waiting on System App

[2022-11-28 10:37:02 +0000] Status=4, isTerminal=NO, Elapsed=00:27.
    Waiting on System App

[2022-11-28 10:37:03 +0000] Status=4294967295, isTerminal=YES, Elapsed=00:28.
    Finished

Something about the Github-actions environment is keeping Detox from creating the simulator dups it needs for concurrency. I don't have a solution for this. I can only propose some words of "wisdom":

  1. Try to dig into Github-action's documentation and look for related limitations. You might want to consider running those commands Detox does in order to play with the environment by yourself, in order to come up with a workaround.
  2. Consider creating simulator clones, the way Detox does, in advance. Detox will use those, if available.
  3. Consider splitting your suite to several, complementary, Github actions.
remcoabalain commented 1 year ago

I've got the same problem with Detox in Microsoft Devops Pipelines, with one worker it works, with more than one, it fails. Same timeout result.

AdamTyler commented 1 year ago

@d4vidi I'm actually running into this issue without concurrency. I'm trying to update my runner to macos-12 and iPhone 13 where previously I was on macos-11 and iPhone 11.

So I'm thinking there is some conflict with newer GH runners and Detox

brycnguyen commented 1 year ago

I'm also running into this same issue and can only run off 1 worker

huszzsolt commented 1 year ago

Have the same issue running 2 iOS workers on Bitrise CI/CD

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe the issue is still relevant, please test on the latest Detox and report back.

Thank you for your contributions!

For more information on bots in this repository, read this discussion.

gabrieldonadel commented 1 year ago

This is still a problem

Zakyyy commented 1 year ago

any solution

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe the issue is still relevant, please test on the latest Detox and report back.

Thank you for your contributions!

For more information on bots in this repository, read this discussion.

stale[bot] commented 1 year ago

The issue has been closed for inactivity.

marinefourton commented 1 year ago

This is still a problem

sergtimosh commented 1 year ago

@zacharyweidenbach could you please eleborate on --runner-config e2e/InitCITests/config.json part? What does this alternative configration contains?

zacharyweidenbach commented 1 year ago

@sergtimosh

It is just a pretty standard config file

{
  "testEnvironment": "../environment",
  "testRunner": "jest-circus/runner",
  "testTimeout": 100000,
  "testRegex": "\\.initci\\.ts$",
  "reporters": ["detox/runners/jest/streamlineReporter"],
  "verbose": true
}

The test regex was targeting 3 test files called worker1.initci.ts, worker2.initci.ts, and worker3.initci.ts. Each of these have a bare bones test in them

import { device } from 'detox';

test('It should start worker 1', () => device.launchApp());

I am no longer working on the codebase where I had this problem, but I was never able to solve it.

sergtimosh commented 1 year ago

@zacharyweidenbach

Thanks for the reply! By "I was never able to solve it" you mean solve with some pretty solution apart from the one you've stated in the topic, or it haven't work at all?

zacharyweidenbach commented 1 year ago

@sergtimosh

Sorry, I mean I was never able to solve this problem without the workaround.