nightwatchjs / nightwatch

Integrated end-to-end testing framework written in Node.js and using W3C Webdriver API. Developed at @browserstack
https://nightwatchjs.org
MIT License
11.79k stars 1.31k forks source link

Tests randomly stop without running assertions if launched via Jenkins CI with test_workers set to true / auto #1374

Closed GrayedFox closed 7 years ago

GrayedFox commented 7 years ago

After much debugging and combing through massive amounts of selenium server output and node debugging (of http with NODE_DEBUG=http flag set), I think I've narrowed down this problem to being one with Nightwatch.

Unfortunately, running tests on our CI server via Jenkins has proved that using test_workers is inherently unstable. The output is unfortunately not helpful, as the behaviour is that a test will begin and then finish without running any assertions (and without any other output from that test) which will result in the build failing. All the extra output from verbose logging and NODE_DEBUGbeing set also just pollutes the log, without any persistent errors appearing.

The most annoying part is that this error never manifests locally - on my Mac, my colleague's Linux (Ubuntu LTS) machine, or any of our other team members (6 people with different workstations) ever experience this issue. It's just when trying to get NW to run inside our CI process with Jenkins.

nightwatch.json:

// fails 9 out of 10 builds on Jenkins
  "request_timeout_options": {
     "timeout": 15000,
     "retry_attempts": 5
   },
  "test_workers": {
    "enabled": true,
    "workers": "auto"
  },

The (not very useful and long) output:

[acceptance-nemesis] ------>  Lets test...!
[acceptance-nemesis] Started child process for: tests/affiliateLandingPages/prefilledAffiliateLandingPage 
[acceptance-nemesis] Started child process for: tests/homePage/homePagePartnerLogos 
[acceptance-nemesis]  tests/affiliateLandingPages/prefilledAffiliateLandingPage   \n
[acceptance-nemesis]  tests/affiliateLandingPages/prefilledAffiliateLandingPage   [Affiliate Landing Pages / Prefilled Affiliate Landing Page] Test Suite
[acceptance-nemesis] ===========================================================================
[acceptance-nemesis]  tests/affiliateLandingPages/prefilledAffiliateLandingPage   Domain: localhost:3000/de
[acceptance-nemesis]  tests/affiliateLandingPages/prefilledAffiliateLandingPage   
[acceptance-nemesis]  tests/affiliateLandingPages/prefilledAffiliateLandingPage   Results for:  Test prefilled date on affiliate landing page
[acceptance-nemesis]  tests/affiliateLandingPages/prefilledAffiliateLandingPage   ✔ Waiting for booking form visibility
[acceptance-nemesis]  tests/affiliateLandingPages/prefilledAffiliateLandingPage   ✔ Testing correct prefilled date
[acceptance-nemesis]  tests/affiliateLandingPages/prefilledAffiliateLandingPage   OK. 2 assertions passed. (4.632s)
[acceptance-nemesis] 
[acceptance-nemesis]   >> tests/affiliateLandingPages/prefilledAffiliateLandingPage finished.  
[acceptance-nemesis] 
[acceptance-nemesis] Started child process for: tests/regularUser/regularUserCancelsRideFree 
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   \n
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   [Regular User / Regular User Cancels Ride Free] Test Suite
[acceptance-nemesis] ==============================================================
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   Domain: localhost:3000/fr
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   Results for:  User can cancel transfer for free if driver unassigned
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for login link to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for flash message to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Testing if element <.flash-message> contains text: "Connexion réussie".
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for booking-form pickup input to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for booking-form dropoff input to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for transfer search button to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for priceDetails select button to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for skip button to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree    ! Formatted PAYMENT METHOD selector with 2
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for select button to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for main book now button to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for confirmation booking-uuid to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree    ! Formatted UUID selector with dd7fe266-0162-41ca-957b-31c1aea34f47
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for booking number to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for cancel button to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for confirm button to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Waiting for booking number to be visible
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Testing that cancel button is disabled
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   ✔ Testing that status is cancelled
[acceptance-nemesis]  tests/regularUser/regularUserCancelsRideFree   OK. 17 assertions passed. (39.586s)
[acceptance-nemesis] 
[acceptance-nemesis]   >> tests/regularUser/regularUserCancelsRideFree finished.  
[acceptance-nemesis] 
[acceptance-nemesis] Started child process for: tests/staticPages/staticPagesAbout 
[acceptance-nemesis]  tests/staticPages/staticPagesAbout   \n
[acceptance-nemesis]  tests/staticPages/staticPagesAbout   [Static Pages / Static Pages About] Test Suite
[acceptance-nemesis] ==================================================
[acceptance-nemesis]  tests/staticPages/staticPagesAbout   Domain: localhost:3000/en
[acceptance-nemesis]  tests/staticPages/staticPagesAbout   
[acceptance-nemesis]  tests/staticPages/staticPagesAbout   Results for:  Important elements and images are visible on the about page
[acceptance-nemesis]  tests/staticPages/staticPagesAbout   ✔ Waiting for header to be visible
[acceptance-nemesis]  tests/staticPages/staticPagesAbout   ✔ Testing if header contains text ABOUT BLACKLANE
[acceptance-nemesis]  tests/staticPages/staticPagesAbout   ✔ Waiting for team header to be visible
[acceptance-nemesis]  tests/staticPages/staticPagesAbout   ✔ Testing if team header contains text MEET THE TEAM
[acceptance-nemesis]  tests/staticPages/staticPagesAbout   ✔ Testing if founder image 3 visible
[acceptance-nemesis]  tests/staticPages/staticPagesAbout   OK. 5 assertions passed. (5.648s)
[acceptance-nemesis] 
[acceptance-nemesis]   >> tests/staticPages/staticPagesAbout finished.  
[acceptance-nemesis] 
[acceptance-nemesis] Started child process for: tests/tracking/trackingEvents 
[acceptance-nemesis]   >> tests/homePage/homePagePartnerLogos finished.  

Notice how the homePagePartnerLogos test finishes a long time after it was started (at the top of the log) without any other output? This is the essential problem. A random (sometimes several) tests will fail in this same way - they will start, other tests will run and pass in between, and then the tests started before will finish without any output. Nothing in any of the extra debug info indicates anything out of the ordinary - (no errors, no warnings - just INFO level logs).

Extra Info During our debugging I found some other annoying bugs which I will link here in case it helps others:

  1. be careful when archiving artifacts with Jenkins, it can silently fail your build: https://issues.jenkins-ci.org/browse/JENKINS-38005

  2. if starting your selenium server manually (and not using the built in nightwatch way) you may run into this bug: https://github.com/seleniumhq/selenium-google-code-issue-archive/issues/2492. I tried using pkill java as a workaround but this only sometimes worked - luckily - Nightwatch seems to handle the cleanup of selenium properly and I do not experience this bug if allowing Nightwatch to manage selenium.

  3. Still having (maybe 2 from 5 builds) failures due to this: https://github.com/nightwatchjs/nightwatch/issues/1083. Not sure what to do here, except wait for a fix - would really appreciate feedback for a workaround - will ask our infrastructure guys if we can reinstall java on the Jenkin's machines, based on comments on that thread. Now trying this workaround: http://stackoverflow.com/questions/41487659/nightwatch-selenium-socket-hang-up

beatfactor commented 7 years ago

I'm afraid I'm going to go ahead and close this. Sorry for the inconvenience, but I don't think I can help you with it. What you are describing is a very complicated scenario and one that is extremely hard to reproduce. I think it's up to you to continue debugging this and if there's a more concrete issue which needs fixing in Nightwatch, please open a new ticket and reference this one.

You might also get more help by posting on the Mailing List.

GrayedFox commented 7 years ago

The solution we went with was containerisation. Honestly, I understand why you would close this - this thing was a nightmare of a bug which comes from having local and remote machines differing in various ways which are unseemly and difficult to spot.

Go Docker.

karthikiyengar commented 7 years ago

I believe I have the same issue - I'm using workers on a docker container and running my tests on CircleCI.

OK. 112  total assertions passed. (2m 22s)
Exited with code 1

Nightwatch exits for some weird reasons without running all of the assertions (I have no failing assertions)

Any solutions? :cry: