Sporadic failures in silverstripe/admin CI

emteknetnz commented 5 months ago

Sporadic failures with behat + js job bundle.js in 4.13 here https://github.com/silverstripe/silverstripe-admin/actions/runs/7747592053

Caused an auto patch release for to fail https://github.com/silverstripe/silverstripe-admin/pull/1670

Update: moving back into backlog to chat about with team - see my comment below for explanation

The retry PR's below are kind of not great, they just add @retry annotation which literally means "wait for up to 3 seconds when doing an assertion". What we really need is to completely retry failed scenarios.

@retry PRs

PRs - Make sure they are merged / released in this exact order!

GuySartorelli commented 5 months ago

Anecdotally, the gridfield-navigation feature is the one I see failing the most. Could be a red herring, or there could be something to it.

GuySartorelli commented 4 months ago

I've noticed @retry on some of the behat features - some tags to actually have functionality tied to them, so we should check if this actually makes them get retried multiple times and if so we should add that tag to the scenarios in gridfield-navigation.

Edit: I'm trying @retry on the feature I mentioned above in https://github.com/silverstripe/silverstripe-admin/pull/1679 Let's see if that reduces the number of failures for admin behat.

GuySartorelli commented 4 months ago

For the js failures, when one fails, download the artifacts and try git diff --word-diff --word-diff-regex=[^;] (might need to play with the regex a bit to get good output) - the idea being the diff it shows should be showing you what is actually different between the two files, instead if just saying the whole file changed.

emteknetnz commented 4 months ago

Created an issue to do this is CI https://github.com/silverstripe/gha-ci/issues/104

emteknetnz commented 3 months ago

I had a good go at getting behat to re-run a scenario on failure, I just can't do it though without resorting to some major hacks. There's just doesn't seem to be enough exposed by behat hooks to allow me to do this.

This will get it to re-run in a 'good enough' manner in RuntimeFeatureTester.php

    public function test(Environment $env, $spec, $skip = false)
    {
        $results = array();
        foreach ($spec->getScenarios() as $scenario) {
            $tester = $scenario instanceof OutlineNode ? $this->outlineTester : $this->scenarioTester;

            $setup = $tester->setUp($env, $spec, $scenario, $skip);
            $localSkip = !$setup->isSuccessful() || $skip;
            $testResult = $tester->test($env, $spec, $scenario, $localSkip);
            $teardown = $tester->tearDown($env, $spec, $scenario, $localSkip, $testResult);

            // rerun here
            if (!$testResult->isPassed()) {
                $setup = $tester->setUp($env, $spec, $scenario, $skip);
                $localSkip = !$setup->isSuccessful() || $skip;
                $testResult = $tester->test($env, $spec, $scenario, $localSkip);
                $teardown = $tester->tearDown($env, $spec, $scenario, $localSkip, $testResult);
            }

            $integerResult = new IntegerTestResult($testResult->getResultCode());
            $results[] = new TestWithSetupResult($setup, $integerResult, $teardown);
        }

        return new TestResults($results);
    }

This is obviously directly modifying upstream code. Which we could do in CI in a very hacky way i.e. just in our own version of the file ... or cleaner just fork behat entirely and use our own fork and just never receive updates again..

You CAN use an @afterScenario hook in BasicContext.php in silverstripe behat-extension

    /** @AfterScenario */
    public function after(AfterScenarioScope $scope)
    {
        $env = $scope->getEnvironment();
        $spec = $scope->getFeature();
        $scenario = $scope->getScenario();
        $testResult = $scope->getTestResult();
        // $tester = 'nope'; :-(
    }

BUT there's just no way to get the $tester (Behat\Behat\Tester\Runtime\IsolatingScenarioTester) which would be required to redo the step from with the hook implementation. $env looks like it used to have the symfony serviceContainer available which might have it available, but now null and there was an inline comment somewhere saying get rid of $env in the next major, I guess cos to prevent the sort of silly code we'd need to do here.

I even tried bootstrapping a new Application and using reflection nonsense though I wasn't able to access it.

I'd rather not fork behat, if we really want retry perhaps the just copying in our own file in CI could work if we took an md5 signature of the file to be replaced and validate that it hasn't been updated upstream first, and if it has just exit 1? Or just don't copy at all in that scenario and put put a console warning somewhere that we hopefully notice at some point.

emteknetnz commented 2 months ago

@GuySartorelli mentioned that there's a way to access the service container during bootstrapping

GuySartorelli commented 2 months ago

See this PR for an example. We could have a subclass of the RuntimeFeatureTester in the behat-extension module and replace the default one in the container - assuming it's defined in the service container before our extension is bootstrapped.

The subclass would have the code from Steve's example above, or something similar - which would then mean we're rerunning all failed scenarios for any failure regardless of what module is being rerun.

I think it should be possible to get a boolean from the behat.yml into the extension for the purposes of setting this to happen only in CI.

sabina-talipova commented 2 months ago

Probably, we can just enhance those steps(function) that usually fail. There is some example how we can do this.

GuySartorelli commented 2 months ago

The suggestions in that link are essentially what the @retry tag already does - and that does alleviate the problem in some cases (after it was applied in this PR it did reduce the number of failures for that feature) but doesn't catch all intermittent failures.

emteknetnz commented 2 months ago

Had another good go at this, still no luck

While I can get access to the service container from the afterScenario hook, the tester isn't normally actually available then. I tried extracting the tester out instead of the container early on, however I was unable to use it later, it threw an error.

I think at this point we need to either fork behat or copy in patch file in CI (my preference)

// behat-extensions/src/Extension.php
// ...
    public function process(ContainerBuilder $container)
    {
        // ...
        // Set the tester in the tester holder singleton
        // This needs to happen in process() rather than load() because Injector isn't available in load()
        // Note that we put the tester into the holder rather than the entire container because the services
        // in the container actually greater between this point and the BasicContainer::after() hook, and the
        // tester isn't available at that point
        $tester = $container->get(TesterExtension::SCENARIO_TESTER_WRAPPER_TAG . '.isolating');
        $testHolder = Injector::inst()->get(TesterHolder::class, true);
        $testHolder->setTester($tester);
    }

// behat-extension/src/TesterHolder.php
<?php

namespace SilverStripe\BehatExtension;

use Behat\Behat\Tester\Runtime\IsolatingScenarioTester;

/**
 * Class intended to be used as a singleton to hold the IsolatingScenarioTester instance
 * which is accessible during the test run, so that it can be accessed from withing
 * BasicContext after the scenario has been run
 */
class TesterHolder
{
    private IsolatingScenarioTester $tester;

    public function getTester(): IsolatingScenarioTester
    {
        return $this->tester;
    }

    public function setTester(IsolatingScenarioTester $tester): void
    {
        $this->tester = $tester;
    }
}

// behat-extensions/src/Context/BasicContext
// ...
    /** @AfterScenario */
    public function after(AfterScenarioScope $scope)
    {
        $skip = false;
        $env = $scope->getEnvironment();
        $spec = $scope->getFeature();
        $scenario = $scope->getScenario();
        $testResult = $scope->getTestResult();
        $testerHolder = Injector::inst()->get(TesterHolder::class, true);
        $tester = $testerHolder->getTester();

        // re-run test if it failed
        // TODO: only do this in CI .. test custom config variable?
        if (!$testResult->isPassed()) {
            // This code is copied from RuntimeFeatureTester::test()
            $setup = $tester->setUp($env, $spec, $scenario, $skip);
            $localSkip = !$setup->isSuccessful() || $skip;
            $testResult = $tester->test($env, $spec, $scenario, $skip);
            // $teardown = $tester->tearDown($env, $spec, $scenario, $localSkip, $testResult);
        }
    }

Results in

  ╳  None of the registered environment handlers seem to support `Behat\Behat\Context\Environment\InitializedContextEnvironment` environment.
  │
  └─ @AfterScenario # SilverStripe\BehatExtension\Context\BasicContext::after()

GuySartorelli commented 2 months ago

We have a few options here that I can see. All of these are done within the load() method of SilverStripe\BehatExtension\Extension.

Option 1: Decorate the scenario tester

Click to see the `RetryScenarioTester` code

```php decoratedTester = $decoratedTester; } public function setUp(Environment $env, FeatureNode $feature, ScenarioInterface $scenario, $skip) { return $this->decoratedTester->setUp($env, $feature, $scenario, $skip); } public function test(Environment $env, FeatureNode $feature, ScenarioInterface $scenario, $skip) { $result = $this->decoratedTester->test($env, $feature, $scenario, $skip); if (!$skip && !$result->isPassed()) { /* Some logic goes here to retry the scenario */ } return $result; } public function tearDown(Environment $env, FeatureNode $feature, ScenarioInterface $scenario, $skip, TestResult $result) { return $this->decoratedTester->tearDown($env, $feature, $scenario, $skip, $result); } } ```

<?php

use Behat\Behat\Tester\ServiceContainer\TesterExtension;
use Symfony\Component\DependencyInjection\ContainerBuilder;
use Behat\Testwork\ServiceContainer\Extension as ExtensionInterface;
use SilverStripe\BehatExtension\Tester\RetryScenarioTester;
use Symfony\Component\DependencyInjection\Definition;
use Symfony\Component\DependencyInjection\Reference;

class Extension implements ExtensionInterface
{
    // ...
    public function load(ContainerBuilder $container, array $config)
    {
        // ...

        // Decorate the scenario tester with our own class so we can retry when they fail
        $definition = new Definition(RetryScenarioTester::class, [
            new Reference(TesterExtension::SCENARIO_TESTER_ID),
        ]);
        $definition->addTag(TesterExtension::SCENARIO_TESTER_WRAPPER_TAG);
        $container->setDefinition(TesterExtension::SCENARIO_TESTER_WRAPPER_TAG . '.silverstripe', $definition);
    }
}

Option 2: Replace the scenario tester with our own class

Unfortunately RuntimeScenarioTester is set as a final class for some reason - so we can't subclass it. We'd have to copy the code and change it to suit our needs.

<?php

use Behat\Behat\Tester\ServiceContainer\TesterExtension;
use Symfony\Component\DependencyInjection\ContainerBuilder;
use Behat\Testwork\ServiceContainer\Extension as ExtensionInterface;
use SilverStripe\BehatExtension\Tester\RetryScenarioTester;
use Symfony\Component\DependencyInjection\Definition;
use Symfony\Component\DependencyInjection\Reference;

class Extension implements ExtensionInterface
{
    // ...
    public function load(ContainerBuilder $container, array $config)
    {
        // ...

        // Replace the scenario tester with our own class so we can retry when they fail
        $definition = new Definition(RetryScenarioTester::class, array(
            new Reference('tester.step_container'),
            new Reference(TesterExtension::BACKGROUND_TESTER_ID)
        ));
        $container->setDefinition(TesterExtension::SCENARIO_TESTER_ID, $definition);
    }
}

Option 3: Replace the feature tester with our own class

Unfortunately RuntimeFeatureTester is set as a final class for some reason - so we can't subclass it. We'd have to copy the code and change it to suit our needs.

<?php

use Behat\Behat\Tester\ServiceContainer\TesterExtension;
use Symfony\Component\DependencyInjection\ContainerBuilder;
use Behat\Testwork\Environment\ServiceContainer\EnvironmentExtension;
use Behat\Testwork\ServiceContainer\Extension as ExtensionInterface;
use SilverStripe\BehatExtension\Tester\RetryFeatureTester;
use Symfony\Component\DependencyInjection\Definition;
use Symfony\Component\DependencyInjection\Reference;

class Extension implements ExtensionInterface
{
    // ...
    public function load(ContainerBuilder $container, array $config)
    {
        // ...

        // Replace the feature tester with our own class so we can retry when they fail
        $definition = new Definition(RetryFeatureTester::class, [
            new Reference(TesterExtension::SCENARIO_TESTER_ID),
            new Reference(TesterExtension::OUTLINE_TESTER_ID),
            new Reference(EnvironmentExtension::MANAGER_ID)
        ]);
        $container->setDefinition(TesterExtension::SPECIFICATION_TESTER_ID, $definition);
    }
}

It is also possible to decorate the feature tester, but from the quick look I took we don't actually have enough information to know which scenario we need to retry if we do that.

Disclaimer

I was exploring what options we have for using dependency injection to avoid hot-patching in CI. I didn't explore what needs to happen to actually retry scenarios.

emteknetnz commented 2 months ago

Good thinking! I totally forgot that yes, when there's dependency injection available that's the correct way to add/override code :-)

I've added some PRs to implement this

GuySartorelli commented 2 months ago

First PR merged, reassigning to Steve for next steps

GuySartorelli commented 2 months ago

Second PR merged, reassigning to Steve for next steps

emteknetnz commented 2 months ago

@GuySartorelli Have released new versions for behat-extension, good to merge the other 2 PRs now

GuySartorelli commented 2 months ago

PRs merged. Reassigning to Steve to validate CI isn't broken

silverstripe / silverstripe-admin