sw-tools / checkin-service

Checks the user in to a flight
MIT License
9 stars 3 forks source link

Puppeteer timeout on checkin #23

Open Rezer opened 2 years ago

Rezer commented 2 years ago

I set this up recently and it has worked properly for one flight, but I just got this in my error log today (the flight checkin time is 1:35 UTC, so it woke up at the appropriate time):

START RequestId: a4483461-12d6-5170-b06f-6e900d55117c Version: $LATEST

2022-05-08T01:30:50.748Z    a4483461-12d6-5170-b06f-6e900d55117c    INFO    Received SQS message {
  reservation: {
    confirmation_number: 'XXXXX',
    first_name: 'Removed',
    last_name: 'Name'
  },
  checkin_available_epoch: 1651973700
}

2022-05-08T01:31:23.709Z    a4483461-12d6-5170-b06f-6e900d55117c    ERROR   TimeoutError: Navigation timeout of 30000 ms exceeded
    at /opt/nodejs/node_modules/puppeteer-core/lib/cjs/puppeteer/common/LifecycleWatcher.js:106:111

2022-05-08T01:31:23.749Z    a4483461-12d6-5170-b06f-6e900d55117c    ERROR   Invoke Error    {
    "errorType": "TimeoutError",
    "errorMessage": "Navigation timeout of 30000 ms exceeded",
    "name": "TimeoutError",
    "stack": [
        "TimeoutError: Navigation timeout of 30000 ms exceeded",
        "    at /opt/nodejs/node_modules/puppeteer-core/lib/cjs/puppeteer/common/LifecycleWatcher.js:106:111"
    ]
}

END RequestId: a4483461-12d6-5170-b06f-6e900d55117c

REPORT RequestId: a4483461-12d6-5170-b06f-6e900d55117c  Duration: 33007.91 ms   Billed Duration: 33008 ms   Memory Size: 1600 MB    Max Memory Used: 529 MB Init Duration: 562.47 ms    

Seems to be just an issue with the initial API token page not loading fast enough through puppeteer. I'm unsure if this means the API token page changed or if there just needs to be a retry configured for this part of the process.

Rezer commented 2 years ago

So I narrowed it down to the SwGenerateHeaders.generateHeaders() function timing out, and threw together a retry method for both basic and advanced headers within HandleScheduledCheckin. I managed to reproduce the error a few times, once again with the default 30 second timeout and for giggles one with a 10 second timeout. They each failed once then managed to continue on the second try. Here's a log of the updated behavior:

START RequestId: 8ba376f3-4c3d-568c-92d6-2c021cb71ea0 Version: $LATEST

2022-05-08T05:00:05.001Z    8ba376f3-4c3d-568c-92d6-2c021cb71ea0    INFO    Received SQS message {
  reservation: {
    confirmation_number: 'XXXXXX,
    first_name: 'Removed',
    last_name: 'Name'
  },
  checkin_available_epoch: 1651986000
}

2022-05-08T05:00:05.158Z    8ba376f3-4c3d-568c-92d6-2c021cb71ea0    DEBUG   basicHeaders {
  Host: 'mobile.southwest.com',
  'Content-Type': 'application/json',
  'X-API-Key': 'l7xx2c186c1297274b828b1872e22edfe67a',
  'X-User-Experience-Id': 'B99E5460-CE8B-11EC-8EF8-95E982AD8E52',
  Accept: '*/*',
  'X-Channel-ID': 'MWEB'
}

2022-05-08T05:00:38.652Z    8ba376f3-4c3d-568c-92d6-2c021cb71ea0    DEBUG   Failed getting advanced headers.  Retrying in 5s...
Exception details: TimeoutError: Navigation timeout of 30000 ms exceeded

2022-05-08T05:18:15.588Z    e21124ba-47da-5c2f-baef-7e4b137274de    DEBUG   advancedHeaders {
  'content-type': 'application/json',
  ...
  (execution continues normally from this point)

Unsure if it would be better to simply raise the timeout within generateHeaders() by using page.setDefaultNavigationTimeout(60000) or just keep relying on the retry to succeed, since it seemed to work on the second attempt even with a 10s timeout instead of 30.

I'm not exactly fluent in typescript but I'll submit a pull request with the changes I made.

swtools0 commented 2 years ago

Thanks for submitting this and your pull request!

This type of timeout stuff is sort of a pain point with my lack of understanding of Puppeteer. You'll see that I don't exactly know how to use Puppeteer properly by my random waitMs here, particularly in a Lambda environment where Puppeteer seems to behave a bit differently from locally. I think retrying is sort of a band-aid over whatever the problem is, but it seems like a sensible solution. If you make some small changes to your PR, I'll be happy to merge it in.

Rezer commented 2 years ago

Sorry kind of lost track of this. I've since come up with a better solution but I haven't had much chance to test it, given the somewhat unique set of requirements for testing this one. I'll see about cleaning it up and submitting another pull request soonish.