Is it possible to consider "flaky" scenarios as not failed?

kingdonb commented 5 years ago

I've recently discovered the --retry option of cucumber-ruby and I've been trying to use it to make my PR builder which reports checks for my very long test suite, a bit more reliable. We have some underprovisioned and contentious APIs which are pretty likely to occasionally cause a bomb-out result somewhere during our nearly hour-long test suite, but with retries fortunately we very rarely see the same tests failing more than twice in a row:

218 scenarios (1 flaky, 217 passed)
2048 steps (1 failed, 14 skipped, 2033 passed)
57m32.229s

So now, with --retry 2 and --no-strict-flaky, I've been able to get Cucumber to report these kind of occasional failures as successful results, without resorting to something like percentages! Hooray

I'm capturing the return code and I can see that Cucumber is returning 0, as expected. I could depend on Cucumber return code to mark the build as failed. Right now if it returns 1, I capture that too and discard my code coverage report before it gets posted, then continue and allow the build to complete so that CucumberReports can be posted in a post-build step.

So I'm using the CucumberReports plugin to mark failing results based on the information in the cucumber json report file. In the configuration dialog (for v4.3.0), there is a space for "Maximum number of {failed,skipped} {steps,scenarios,features}, and I've increased the number of failed and skipped steps to something like 6 and 20.

(Now I've upgraded from 4.3.0 to 4.6.0 and some of those options are gone. "Ignore bad steps" is the option that replaced them, a bit more inscrutable but not worse). But apparently (at least v4.3.0 of) Cucumber Reports reads that one flaky scenario as a failed scenario.

[CucumberReport] Using Cucumber Reports version 4.3.0
[CucumberReport] JSON report directory is "features/reports/"
[CucumberReport] Copied 1 json files from workspace "..." to reports directory "..."
[CucumberReport] Copied 0 properties files from workspace "..." to reports directory "..."
[CucumberReport] Processing 1 json files:
[CucumberReport] /var/jenkins_home/.../results.json
[CucumberReport] Found 1 failed scenarios, while expected not more than 0
[CucumberReport] Build status is changed to FAILURE
Build step 'Cucumber reports' changed build result to FAILURE
Finished: FAILURE

Since these workers are cleaned up right after the build, it's hard for me to see if results.json reports those scenarios as flaky or failed, as it means I need to reproduce a flaky result for myself, outside of the pipeline, or otherwise snag a copy of results.json somehow. I thought, I could go to some great lengths to capture a flaky scenario in results.json and see for myself whether it's even possible to fix this issue right here in CucumberReports, or I could just ask about it...

Have you seen this configuration before, and do you know if it's possible for CucumberReports to treat flaky scenarios as different than failed?

I'll swing back around in an hour or so after my build has completed with the latest 4.6.0 reports plugin, and mark it as closed if it looks like this is already resolved. (And if it's being reported as failed not flaky by cucumber-ruby in results.json, maybe there's nothing that you can do here until they fix that, and I'll go refile the issue there.)

kingdonb commented 5 years ago

It looks like Cucumber Reports 4.6.0 unfortunately might not even be loaded for me right now because of a too-old Token Macro Plugin (v2.6) which seems like the upgrade depends on a newer version of Jenkins than I had installed already.

I think the changes I described in the configuration interface might be coming from another Cucumber plugin I had installed. It might be a while before I can report back about whether 4.6.0 has solved this or not.

damianszczepanik commented 5 years ago

So what you are requesting for as I'm quite confused now.

kingdonb commented 5 years ago

It will be easier for me to ask with a concrete example results.json, let me go make a flaky test and see if the output reflects "flaky" vs "failed"

kingdonb commented 5 years ago

OK, I created a simple minimal example of a test that fails sometimes and passes others, so I can make this request in isolation of all the other baloney I have going on...

Using the default profile...
Feature: Janky feature

  Scenario: janky example  # features/janky.feature:2
    Given the test example # features/step_definitions/janky_steps.rb:1
       (RuntimeError)
      ./features/step_definitions/janky_steps.rb:3:in `"the test example"'
      features/janky.feature:3:in `Given the test example'
      Capybara::Driver::Base#save_screenshot (Capybara::NotSupportedByDriverError)
      ./features/support/hooks.rb:12:in `After'

  Scenario: janky example  # features/janky.feature:2

1 scenario (1 flaky)
2 steps (1 failed, 1 passed)
0m0.651s

Here is the output of my flaky test (above) careful not to use the word "flaky" in any scenarios, examples, or steps.

Here is the JSON output (gist) note that, it does not include the word "flaky" anywhere, either :sad_trombone_sound:

Is there enough information here to determine that there was a flaky test result, and the failure can be discarded?

kingdonb commented 5 years ago

Here is my flaky scenario and step definition, and the other steps needed to reproduce this, in case they are needed:

$ cat features/janky.feature
Feature: Janky feature
  Scenario: janky example
    Given the test example

$ cat features/step_definitions/janky_steps.rb
Given('the test example') do
  if rand(100) > 30
    fail
  end
end

$ bundle exec cucumber --retry 3 features/janky.feature

(try this a few times, you will get a result sooner or later that looks like:

1 scenario (1 flaky)
2 steps (1 failed, 1 passed)

)

Again I apologize for making a report from an older version, but when I tried upgrading cucumber-reports from 4.3.0 to 4.6.0, I learned that my Jenkins server was already too old, my build configuration was accidentally nuked, and I wound up going down the rabbit hole of rebuilding the Jenkins server so that I could test against the latest version of cucumber-reports plugin.

I still haven't tried that, because repairing my actual build was a higher priority, and the flaky tests are not so common that I can't just try running the build again, or running those tests that are marked failed by hand. So right now I have a workaround, that works OK on the old version, but somewhat annoying. I have yet to confirm this is still an issue on 4.6.0 but I suspect it is.

(Later on tonight, I will repeat this build on a fresh Jenkins server that I have stood up which has the latest versions of LTS jenkins and this plugin. If you want to wait until after I have done that, before you spend any time looking at this, then please feel free.)

It may make sense to submit this upstream, since the cucumber-ruby client outputs "1 scenario (1 flaky)" it could certainly be easier for them to report a flaky test using the word "flaky" somewhere in the output file, rather than for you to intuit it somehow from the results.json which might or might not have enough information, I don't know... the file that I have included in the gist linked before.

But in any case, they may decide not to change output formats, the convenient thing. Or maybe it's a standard format they won't be willing to break away from, and "flaky" is not part of the standard reporting, so maybe they will want to avoid this change that might result in breaking things downstream like cucumber-reports... even if they do make a change, it will probably require an additional change over here too, in order to get Cucumber Reports to do the right thing with it.

I depend on Cucumber Reports not only to deliver the report, but also to mark the build as failed. If I mark the build as failed any earlier (say, from the return code out of cucumber-ruby, which correctly reflects the status of the build as passed when flaky tests have not permanently failed), then post-build steps will skip and I am pretty sure Cucumber Reports (which of course runs in a post-build action) will never execute generating HTML reports, and my build result will be missing my beautiful reports about what went wrong and any other post-build steps.

Maybe there is some other workaround, but if it was possible to resolve this here, I'd appreciate it greatly if you had any ideas about how! Thanks --Kingdon

damianszczepanik commented 5 years ago

still it is not clear to me what do you want me to implement as the new feature or bug fix

kingdonb commented 5 years ago

If it's not clear that's OK, I'm at a workshop today but I will follow this up later with screenshots from Jenkins if I can try to make it clearer. I have to test it against the latest version of cucumber-reports, yet.

The desired behavior is, when "flaky" scenarios have passed after some number of retries, Cucumber Reports should be configurable to disregard all failing or skipped steps, and let the build be marked as passed. When I tested this with 4.3.0, it seemed like that wasn't possible.

(I could set up a threshold "number of allowed skipped steps" or "number of allowed failed steps" but if there is some setting like -1 for "unlimited" I didn't find it. I guess now, with percentage-based counting support, I could set it to 100% and that would work.)

It seemed like it was counting the failed scenario that repeated as "one failed, one passed" scenario – I would like it if the repeated scenario that was repeated because of --retry setting, was counted only once, and didn't cause the build to get marked as failed.

I still have yet to plug all of the parts in together and prove conclusively that this is still an issue for 4.6.0.

damianszczepanik commented 5 years ago

Does it look like https://github.com/damianszczepanik/cucumber-reporting/pull/692 ?

kingdonb commented 5 years ago

That PR might resolve it for me, maybe it's worth a try! I'm willing to test it out, is there a build robot which has already produced an .hpi file or should I take a stab at building it?

(I think I've built a jenkins plugin from source before, it's just maven right...)

kingdonb commented 5 years ago

I still have to try that PR,

Here's the output I started with, before I tried configuring some number of steps to allow skipped steps:

213 scenarios (1 flaky, 212 passed)
1978 steps (1 failed, 8 skipped, 1969 passed)
52m28.494s
Coverage report generated for features, specs to /home/jenkins/workspace/pa/pa--pr-builder/coverage. 25432 / 28203 LOC (90.17%) covered.
+ CUCUMBER_RESULT=0
+ '[' 0 == 0 ']'
+ '[' 0 == 0 ']'
+ rake codecov:upload
/usr/local/bundle/gems/activesupport-4.2.11.1/lib/active_support/core_ext/object/duplicable.rb:111: warning: BigDecimal.new is deprecated; use BigDecimal() method instead.
2019-05-06 21:51:30 WARN Selenium [DEPRECATION] Selenium::WebDriver::Chrome#driver_path= is deprecated. Use Selenium::WebDriver::Chrome::Service#driver_path= instead.
{"uploaded": true, "url": "https://codecov.io/bitbucket/nd-oit/personnel-actions/commit/1234abcd", "queued": true, "meta": {"status": 200}, "message": "Coverage reports upload successfully", "id": "xyz"}
+ echo 'Uploaded simplecov report to codecov'
Uploaded simplecov report to codecov
$ ssh-agent -k
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 81 killed;
[ssh-agent] Stopped.
Recording test results
[DocLinks] Copying Coverage Reports to 1 ...
[CucumberReport] Using Cucumber Reports version 4.3.0
[CucumberReport] JSON report directory is "features/reports/"
[CucumberReport] Copied 1 json files from workspace "/home/jenkins/workspace/pa/pa--pr-builder/features/reports" to reports directory "/var/jenkins_home/jobs/pa/jobs/pa--pr-builder/builds/71/cucumber-html-reports/.cache"
[CucumberReport] Copied 0 properties files from workspace "/home/jenkins/workspace/pa/pa--pr-builder/features/reports" to reports directory "/var/jenkins_home/jobs/pa/jobs/pa--pr-builder/builds/71/cucumber-html-reports/.cache"
[CucumberReport] Processing 1 json files:
[CucumberReport] /var/jenkins_home/jobs/pa/jobs/pa--pr-builder/builds/71/cucumber-html-reports/.cache/results.json
[CucumberReport] Found 8 skipped steps, while expected not more than 0
[CucumberReport] Build status is changed to FAILURE
Build step 'Cucumber reports' changed build result to FAILURE
Finished: FAILURE

Again I apologize I haven't upgraded this instance, so this is a few versions back... it's my working Jenkins behind firewalls and it's daunting to rebuild it, which is what I think I have to do to upgrade everything safely at this point, because I'm a little bit too far behind and some plugins have made breaking changes.

I have to follow this up still. Thank you for not already closing my issue report.

My actual problem is that when Cucumber Reports marks the build as failed, it appears that CodeCov gets this signal somehow, and no code coverage reports are posted on the PR. I wouldn't mind this build gets reported as failed, except for that. Flaky tests are bad news and they should be treated with extreme prejudice. But the codecov reports will be good enough and should be posted.

I don't fully understand how CodeCov gets this news though, because otherwise it seems the builds which are not reporting Failure or flaky tests at all, still get marked up by CodeCov, and there is no other post-build step which invokes Codecov at all from what I can tell. And yet those PRs which I've already closed one way or another, still receive Checkmarks from codecov and this one doesn't.

damianszczepanik commented 5 years ago

You can try https://github.com/jenkinsci/cucumber-reports-plugin/blob/master/src/main/java/net/masterthought/jenkins/CucumberReportPublisher.java#L416 so CodeCov should be happy

karlhe commented 5 years ago

I have the same request as @kingdonb, tried building the PR but wasn't really successful (I can build the jar but can't figure out how to build the hpi out of it?).

The gist of the problem is as @kingdonb said that when you run cucumber with the retry flag, it marks the build as a FAILURE even if a subsequent retry passed. This shows up in cucumber report as something like this:

I'm running the plugin with the following command:

cucumber buildStatus: 'FAILURE', fileIncludePattern: 'artifacts/reports/**/*.json', trendsLimit: 10

I do think it should mark the build as failed if all retry attempts also fail, but in this case it succeeds on the second try. It would be nice in this case to have the option to mark it as unstable or success instead. Otherwise the retry function on cucumber doesn't really serve a purpose.

To be fair, the reason this happens is the json output of cucumber will just emit an entry for every try.
Related issue that didn't really go anywhere: https://github.com/cucumber/cucumber-ruby/issues/1333

damianszczepanik commented 5 years ago

More understand your need less I'm happy to accept this change request as it should be possible to configure such behavior at the test level not report.

kingdonb commented 5 years ago

As I have thought it over, I feel the same way. ruby-cucumber seems to have the capacity to report flaky tests as either successful or failing via return code (eg. --retry 2 --no-strict-flaky), so it should be configurable one way or another in the json reporting, too. But I am not sure if the flaky state is something that can be reported and captured explicitly in reports, or if it was decided by some standards body of cucumber, whether or not this behavior would be something cucumber-ruby can alter by themselves without going further upstream. I have to imagine it's in the spec though.

The other issue to consider is that it is only scenarios which are reported as flaky – steps and features are (reasonably) not able to be considered as flaky. But the cucumber-reports plugin considers the return status of steps and features, and you will still have to opt into allowing a certain number of failing and skipped steps and features in order to get a passing result. I can certainly crank those up to 100 or 1000+ so that my setting of --retry 3 is not likely to trigger an overload and mark the build as failed, based on reporting too many skipped or failed steps.

It seems like there might be some changes needed on either side of the interface to make this work well with --retry. I have very nearly finished upgrading my CI and dev stacks so that I am on all current and supported versions of every part, when I will be better able to provide testing and suggest a change to HEADs.

Thank you for your patience in keeping this issue open, I will follow up again soon.

Maybe it would be possible to add a feature to cucumber-reports that ignores failing and skipped steps, altogether, rather than by threshold? At this point I'm not sure what other follow-up in Cucumber Reports plugin is warranted. If the whole build status is reported as failed below in the json report file, then Cucumber Reports is right to mark the build as failed after consuming the report and that should probably not be reasonable to have configurable as an option.

But if there is no "whole build status" value reported in the json, then I'm not sure there is enough information there for Cucumber-Reports to divine that it was a flaky test. (I still have to look at the report format to understand better.) IIRC it would have to somehow pattern match and recognize that a second test execution is a retry, as there didn't seem to be any way to tell directly by looking.

kingdonb commented 5 years ago

It's hard to argue strongly in favor of this because what I'm really asking for is the ability to disregard flaky tests. It would be better to report on flaky tests and squash them sooner rather than later as currently not all of my tests are flaky, not even most, and those that are should be followed up so that they are fixed. It's anathema to me that we would just permanently sweep them under the rug.

But with a test suite of thousands of tests, less than five at a time (usually just one) are flaky as observed with --retry 3 setting, and almost always a simple race condition in my test code itself, it's really tempting to do just that; I know I shouldn't do it, sooner or later one of those flaky tests is going to represent a real feature that actually only works some of the time, and needs my attention.

damianszczepanik commented 5 years ago

I understand your point of view and I can agree with it as it is quite reasonable. My concern is that I don't want to manipulate the result and I will display result based on what is stored in JSON file. I don't care why it was marked that way by cucumber

kingdonb commented 5 years ago

I've reviewed a JSON report with a single flaky test in it, and I think I can see what's happening now.

Since the JSON report strictly doesn't contain any rollup statuses, and even the JSON scenarios are not rolled up from what I can tell, I think that cucumber-reports is doing a simple check "is any step failing in this scenario" to decide whether a scenario passed or failed. And for a build, doing a similar simple check: "did any scenario fail?"

It would take cucumber-reports-plugin adding a "No Strict Flaky" option to match --no-strict-flaky that I've used in cucumber-ruby to generate the report with --retry. I don't think it's necessarily manipulation of the build status in this case, although it's probably more complicated than before. In truth there is no manipulating, since no rollup status is provided in the report file at all. Not for either the scenario or the build. It's left up to the reporting plugin or other interpreter to judge by passing or failing steps, whether any scenario (or build) has passed or failed.

It seems possible that this build with only one flaky scenario, repeated twice and passing the second time... could be safely interpreted to return a passing build status without any manipulation of the parameters or states? Here's my example results.json file:

https://gist.github.com/kingdonb/5ee427eb59af9668466afdc427ec1b38

Desired state:

The results.json file (attached) contains a scenario, repeated twice, with one failing step in one execution of the scenario, and the other scenario with only passing steps. The repeated scenario is identifiable by the matching ID, so you can see that it was run more than once in this report:

"id":"add-job-hourly-(ultratime);t1-will-require-a-valid-ultratime-supervisor"

The user checks a "No strict flaky" option in the Cucumber Reports Plugin options for the build.
The report shows the scenario executing twice, with one passing result and one failing result (note: this is the current state. I don't want to change this at all, it's perfect. This way users can still follow up flaky reports and nothing is hidden.)
The rollup build status is reported as passing because the user enabled No strict flaky, and the only tests that failed were also re-executed per --retry option and later passed.

It could be possible to infer that the user wants no-strict-flaky, but since cucumber-ruby didn't assume that and required an opt-in, I would not ask you to infer it. Maybe as an option to the user it could present a better choice than ignoring an arbitrary number of failed steps or skipped steps?

damianszczepanik commented 5 years ago

Solution from last paragraph is already implemented. I'm not sure about ID rules ss what you observed is based on ryby implementation. Is it common for others I'm not sure.

vlokesh09 commented 1 year ago

@kingdonb I have the same issue where I see both failed and retried test were reporting in the report basically inflating the number of total tests and failed . were you able to figure out a solution ? I could not find any option that I can provide in reporter js to ignore the flaky failed test...

damianszczepanik commented 1 year ago

This may help https://github.com/damianszczepanik/cucumber-reporting/blob/master/src/main/java/net/masterthought/cucumber/reducers/ReducingMethod.java

jenkinsci / cucumber-reports-plugin

Is it possible to consider "flaky" scenarios as not failed? #315