sanitizers / patchback-github-app

https://github.com/apps/patchback
GNU General Public License v3.0
10 stars 4 forks source link

Bot stopped working #27

Closed felixfontein closed 2 years ago

felixfontein commented 2 years ago

The bot stopped working. It still worked on Saturday IIRC, but since Monday it definitely no longer works.

webknjaz commented 2 years ago

/me looks

webknjaz commented 2 years ago

This is from the yesterday's logs highlights (copied from an email Coralogix monitoring sent me):

Error R10 ( Boot timeout ) - > Web process failed to bind to $ PORT within 60 seconds of launch

I wonder if something in GH API changed that caused this (as it's happened in the past)... For some reason I can't log into the Heroku account too. But that's probably unrelated (maybe it's related with their recent security breach..)

webknjaz commented 2 years ago

Heroku will start resetting user account passwords today, May 4, 2022, as mentioned in our previous notification.

Oh.

webknjaz commented 2 years ago

I can't log into the Heroku account

(hacker voice) I'm in!

webknjaz commented 2 years ago

Hm... This is interesting too. I'm gonna check if there's anything I could stop there.

[Action Required] - Your app(s) using free dyno hours will stop running soon

webknjaz commented 2 years ago

All the logs I'm getting look like:

2022-05-17T15:03:18.423011+00:00 heroku[web.1]: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch 2022-05-17T15:03:18.474082+00:00 heroku[web.1]: Stopping process with SIGKILL 2022-05-17T15:03:18.660153+00:00 heroku[web.1]: Process exited with status 137 2022-05-17T15:03:18.928595+00:00 heroku[web.1]: State changed from starting to crashed 2022-05-17T15:03:18.931708+00:00 heroku[web.1]: State changed from crashed to starting 2022-05-17T15:03:22.648456+00:00 heroku[web.1]: Starting process with command python -m patchback

And there's no extra output hinting at what's going on. It just says "crashed" while there's no logs of it even starting to load. Also, the GH integration seems to be dead. "Deploy from GitHub" button doesn't work, I removed the integration, tried re-adding, approved the authorization on the GH side but in Heroku dashboard it just errors out...

All signs are pointing at Heroku being drunk so far :man_shrugging:

The pop-ups I'm seeing are as follows:

Item could not be retrieved: Internal Server Error

webknjaz commented 2 years ago

Can't deploy through git push, can't spawn any one-off dynos either.

webknjaz commented 2 years ago

I left heroku run --app patchback bash open for a while and when I came back I saw that I actually got a shell. Plus, the top of the log said something about the connection timeout to GitHub. I remembered that I had some network-bound stuff @ .profile and I was right:

$ head -n1 .profile                                   
export GIT_TAGS=$(git ls-remote --tags git://github.com/sanitizers/patchback-github-app.git | grep -v '\^{}$' | grep -E "^${SOURCE_VERSION}")

Recently, GitHub has dropped support for the unauthenticated git:// protocol which may explain the dyno startup timing out. I'll try to replace it with https:// and hope that this is it.

webknjaz commented 2 years ago

Nope, that didn't help...

webknjaz commented 2 years ago

Although, since there's

2022-05-17T16:03:54.493439+00:00 heroku[web.1]: Starting process with command python -m patchback

in the logs, I suppose the problem is further down the road. Let me try to enable debug, then...

webknjaz commented 2 years ago

Oh, and it's worth mentioning that locally the app starts up just fine. (Although, it's Python 3.9 here)

webknjaz commented 2 years ago

[...] the log said something about the connection timeout to GitHub. [...]

This is what it looks like after waiting for ~2.5 min:

$ heroku run --app patchback bash
Running bash on ⬢ patchback... up, run.3085 (Free)
fatal: unable to connect to github.com:
github.com[0: 140.82.121.4]: errno=Connection timed out

~ $
webknjaz commented 2 years ago

The app starts up in a one-off dyno without any problems:

~ $ python -m patchback
/app/.heroku/python/lib/python3.8/site-packages/envparse.py:195: UserWarning: Could not any envfile.
  warnings.warn('Could not any envfile.')
DEBUG:octomachinery.app.server.runner:================ App version: 0.1.dev0+gcee099765db205a20b3bfbaa38b90a5c86dec430 =================
DEBUG:asyncio:Using selector: EpollSelector
DEBUG:octomachinery.app.server.machinery:The GitHub App env is set to `prod`
INFO:octomachinery.app.server.machinery:Webhook secret (8...3) is [SET]: SIGNATURE VERIFICATION WILL BE ENFORCED
INFO:octomachinery.app.server.machinery:Starting the following GitHub App:
INFO:octomachinery.app.server.machinery:* app id: 21488
INFO:octomachinery.app.server.machinery:* private key SHA-1 fingerprint: 2f:d9:5a:96:d8:d4:8d:fe:e6:71:19:b9:d5:3a:41:2d:90:59:11:fb
INFO:octomachinery.app.server.machinery:* user agent: Patchback-Bot/0.1.dev0+gcee099765db205a20b3bfbaa38b90a5c86dec430 (+https://github.com/apps/patchback)
[...]

Which re-ensures me that Heroku platform is to blame. Not sure what they run at the beginning of the session but it doesn't seem related to Patchback itself...

webknjaz commented 2 years ago

Interesting... It appears that their Git-over-SSH is broken but git push https://git.heroku.com/patchback.git HEAD managed to trigger a deployment. I believe it works now but I'll keep this issue open for some time, I still don't understand why some Git/GitHub bits of the interaction with Heroku/Dashboard are broken.

felixfontein commented 2 years ago

Thanks fox fixing this! In any case, it works quite well again right now.

webknjaz commented 2 years ago

Looks like another bot hosted on Heroku is suffering the same issue.

webknjaz commented 2 years ago

Chronographer also needed %s/git:/https:/g and a manual push. I seems like there's two distinct problems here: 1) Having git:// leftovers in some apps makes Heroku get stuck and eventually time out. Having this fixed and manually redeployed also makes heroku run --app patchback bash like 4x faster without that I/O wait. 2) Heroku's integration with GitHub is somehow broken since it still errors out with an Internal Server Error when I try to do anything about it.

webknjaz commented 2 years ago

Oh... Apparently, they've just disabled the GH integration until they're done investigating (emphasis mine):

Update

Based on current progress, we plan to complete our investigation by May 30, 2022. We are continuing with remediation activities and plan to publish additional information about the incident once it’s resolved.

At this time we have seen no additional OAuth tokens compromised beyond what was reported on April 15, 2022. GitHub has contacted all customers they identified as affected by the issue. Heroku completed the necessary password resets on May 5, 2022. We have no evidence of any unauthorized access to Heroku systems since April 14, 2022. In the event we notify customers directly, the email communication will be sent from Salesforce (techcomms@mail.salesforce.com).

We know you are waiting for us to re-enable our integration with GitHub, and we've committed to you that we will only do so following a security review. We will post more information to status.heroku.com when it is available.

Posted 21 hours ago, May 18, 2022 00:38 UTC

(https://status.heroku.com/incidents/2413)

Closing this issue, then. The workaround has been found, triggering deploys manually is good enough for me for now. Maybe at some point I'll get to integrating this into GHA but it's not a top priority rn.