mozilla / fxa

Monorepo for Mozilla Accounts (formerly Firefox Accounts)
https://mozilla.github.io/ecosystem-platform/
Mozilla Public License 2.0
590 stars 212 forks source link

Pairing tests sometimes fail on CircleCI, is CircleCI being rate limited by the channelserver? #898

Closed shane-tomlinson closed 5 years ago

shane-tomlinson commented 5 years ago

See https://screencap.co.uk/images/66e8761d154de9cd9a7f05a159a9485267d6bb96.png

cc @vladikoff and @jrconlin

jrconlin commented 5 years ago

@jbuck I don't believe you have rate limiting currently set for channel server, right?

@shane-tomlinson is there any way to capture what the cause of the failure is? (e.g. connection timeout, 40* message, etc.)

jrconlin commented 5 years ago

FWIW, WebPush would sometimes also get similar errors when build tests would fail to connect to the WebPush production servers. Our servers reported no errors, but the believed a potential cause was that the test machines would run many tests in parallel and IO nodes would saturate, leading to no, or limited outbound network connections. This would lead to the tree going orange, and eventually the WebPush tests were de-prioritized. I don't know if that's the case here, but I don't have a lot of info about what the machines running the tests are doing and seeing.

shane-tomlinson commented 5 years ago

@jbuck or @jrgm is there any chance the dev channel server (dev.channelserver.nonprod.cloudops.mozgcp.net) is rate limited per IP, or is the integration to iprepd set up and could be blocking Circle? I'm trying to figure out why the pairing tests fail intermittently. It looks like an iprepd server can be specified here: https://github.com/mozilla-services/pairsona/blob/b4e94be268c51a0d4a31ff967f7dc1d8041a2616/channelserver/src/settings.rs#L44

@vladikoff or @eoger is there any way to see the browser log in Circle?

vladikoff commented 5 years ago

there is a 5 minute timeout, if that runs out then you would get that screen. How long does it sit on that screen?

shane-tomlinson commented 5 years ago

there is a 5 minute timeout, if that runs out then you would get that screen. How long does it sit on that screen?

@vladikoff - here's one example, https://circleci.com/gh/mozilla/fxa/9250

firefox on linux 4.4.0-144-generic - pairing - it can pair (11.724s)
    Error: QRTimeout: Error
      at getQrData.then.catch  <tests/functional/pairing.js:85:37>
      at pollForScreenshot  <tests/functional/pairing.js:75:8>
      at Timeout._onTimeout  <tests/functional/pairing.js:89:19>
      at ontimeout  <timers.js:436:11>
      at tryOnTimeout  <timers.js:300:5>
      at listOnTimeout  <timers.js:263:5>
      at Timer.processTimers  <timers.js:223:10>
      at pollForScreenshot  <tests/functional/pairing.js:75:8>
      at Timeout._onTimeout  <tests/functional/pairing.js:89:19>
      at ontimeout  <timers.js:436:11>
      at tryOnTimeout  <timers.js:300:5>
      at listOnTimeout  <timers.js:263:5>
      at Timer.processTimers  <timers.js:223:10>
      at pollForScreenshot  <tests/functional/pairing.js:75:8>
      at Command.<anonymous>  <tests/functional/pairing.js:97:10>
      at Command.<anonymous>  <tests/functional/lib/helpers.js:60:23>
      at Command.<anonymous>  <tests/functional/lib/helpers.js:73:18>
      at Command.<anonymous>  <tests/functional/lib/helpers.js:68:10>
      at Test.it can pair [as test]  <tests/functional/pairing.js:118:10>
      at <src/lib/Test.ts:263:51>
      at Test.it can pair [as test]  <tests/functional/pairing.js:122:15>
      at <src/lib/Test.ts:263:51>

The screencap shows a QR code too, so seems strange.

vladikoff commented 5 years ago

ah based on

Error
Error
Error

that means it tried to read the QR code 3 times and failed. This probably means the reader failed to read it.

shane-tomlinson commented 5 years ago

@vladikoff you mentioned in channel last night that the channel server was down:

Could that be the cause?

Here is a different error from today:

https://circleci.com/gh/mozilla/fxa/9652?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link

Couldn't find enough finder patterns:0 patterns found Couldn't find enough finder patterns:0 patterns found Couldn't find enough finder patterns:0 patterns found Screenshot saved at: https://screencap.co.uk/images/53f5cc079ccbb6570ffdd817dc5c85712afd798e.png × firefox on linux 4.4.0-144-generic - pairing - it can pair (11.909s) Error: QRTimeout: Couldn't find enough finder patterns:0 patterns found at getQrData.then.catch <tests/functional/pairing.js:85:37> at pollForScreenshot <tests/functional/pairing.js:75:8> at Timeout._onTimeout <tests/functional/pairing.js:89:19> at ontimeout at tryOnTimeout at listOnTimeout at Timer.processTimers at pollForScreenshot <tests/functional/pairing.js:75:8> at Timeout._onTimeout <tests/functional/pairing.js:89:19> at ontimeout at tryOnTimeout at listOnTimeout at Timer.processTimers at pollForScreenshot <tests/functional/pairing.js:75:8> at Command. <tests/functional/pairing.js:97:10> at Command. <tests/functional/lib/helpers.js:60:23> at Command. <tests/functional/lib/helpers.js:73:18> at Command. <tests/functional/lib/helpers.js:68:10> at Test.it can pair [as test] <tests/functional/pairing.js:118:10> at <src/lib/Test.ts:263:51> at Test.it can pair [as test] <tests/functional/pairing.js:122:15> at <src/lib/Test.ts:263:51>

With an image that looks strikingly similar:

shane-tomlinson commented 5 years ago

Ref https://bugzilla.mozilla.org/show_bug.cgi?id=1546351

vladikoff commented 5 years ago

also https://bugzilla.mozilla.org/show_bug.cgi?id=1546209

clouserw commented 5 years ago

We'll close this and assume it was related to the server downtime.