Closed felixfontein closed 2 years ago
/me looks
This is from the yesterday's logs highlights (copied from an email Coralogix monitoring sent me):
Error R10 ( Boot timeout ) - > Web process failed to bind to $ PORT within 60 seconds of launch
I wonder if something in GH API changed that caused this (as it's happened in the past)... For some reason I can't log into the Heroku account too. But that's probably unrelated (maybe it's related with their recent security breach..)
Heroku will start resetting user account passwords today, May 4, 2022, as mentioned in our previous notification.
Oh.
I can't log into the Heroku account
(hacker voice) I'm in!
Hm... This is interesting too. I'm gonna check if there's anything I could stop there.
[Action Required] - Your app(s) using free dyno hours will stop running soon
All the logs I'm getting look like:
2022-05-17T15:03:18.423011+00:00 heroku[web.1]: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch 2022-05-17T15:03:18.474082+00:00 heroku[web.1]: Stopping process with SIGKILL 2022-05-17T15:03:18.660153+00:00 heroku[web.1]: Process exited with status 137 2022-05-17T15:03:18.928595+00:00 heroku[web.1]: State changed from starting to crashed 2022-05-17T15:03:18.931708+00:00 heroku[web.1]: State changed from crashed to starting 2022-05-17T15:03:22.648456+00:00 heroku[web.1]: Starting process with command
python -m patchback
And there's no extra output hinting at what's going on. It just says "crashed" while there's no logs of it even starting to load. Also, the GH integration seems to be dead. "Deploy from GitHub" button doesn't work, I removed the integration, tried re-adding, approved the authorization on the GH side but in Heroku dashboard it just errors out...
All signs are pointing at Heroku being drunk so far :man_shrugging:
The pop-ups I'm seeing are as follows:
Item could not be retrieved: Internal Server Error
Can't deploy through git push
, can't spawn any one-off dynos either.
I left heroku run --app patchback bash
open for a while and when I came back I saw that I actually got a shell. Plus, the top of the log said something about the connection timeout to GitHub. I remembered that I had some network-bound stuff @ .profile
and I was right:
$ head -n1 .profile
export GIT_TAGS=$(git ls-remote --tags git://github.com/sanitizers/patchback-github-app.git | grep -v '\^{}$' | grep -E "^${SOURCE_VERSION}")
Recently, GitHub has dropped support for the unauthenticated git://
protocol which may explain the dyno startup timing out. I'll try to replace it with https://
and hope that this is it.
Nope, that didn't help...
Although, since there's
2022-05-17T16:03:54.493439+00:00 heroku[web.1]: Starting process with command
python -m patchback
in the logs, I suppose the problem is further down the road. Let me try to enable debug, then...
Oh, and it's worth mentioning that locally the app starts up just fine. (Although, it's Python 3.9 here)
[...] the log said something about the connection timeout to GitHub. [...]
This is what it looks like after waiting for ~2.5 min:
$ heroku run --app patchback bash
Running bash on ⬢ patchback... up, run.3085 (Free)
fatal: unable to connect to github.com:
github.com[0: 140.82.121.4]: errno=Connection timed out
~ $
The app starts up in a one-off dyno without any problems:
~ $ python -m patchback
/app/.heroku/python/lib/python3.8/site-packages/envparse.py:195: UserWarning: Could not any envfile.
warnings.warn('Could not any envfile.')
DEBUG:octomachinery.app.server.runner:================ App version: 0.1.dev0+gcee099765db205a20b3bfbaa38b90a5c86dec430 =================
DEBUG:asyncio:Using selector: EpollSelector
DEBUG:octomachinery.app.server.machinery:The GitHub App env is set to `prod`
INFO:octomachinery.app.server.machinery:Webhook secret (8...3) is [SET]: SIGNATURE VERIFICATION WILL BE ENFORCED
INFO:octomachinery.app.server.machinery:Starting the following GitHub App:
INFO:octomachinery.app.server.machinery:* app id: 21488
INFO:octomachinery.app.server.machinery:* private key SHA-1 fingerprint: 2f:d9:5a:96:d8:d4:8d:fe:e6:71:19:b9:d5:3a:41:2d:90:59:11:fb
INFO:octomachinery.app.server.machinery:* user agent: Patchback-Bot/0.1.dev0+gcee099765db205a20b3bfbaa38b90a5c86dec430 (+https://github.com/apps/patchback)
[...]
Which re-ensures me that Heroku platform is to blame. Not sure what they run at the beginning of the session but it doesn't seem related to Patchback itself...
Interesting... It appears that their Git-over-SSH is broken but git push https://git.heroku.com/patchback.git HEAD
managed to trigger a deployment. I believe it works now but I'll keep this issue open for some time, I still don't understand why some Git/GitHub bits of the interaction with Heroku/Dashboard are broken.
Thanks fox fixing this! In any case, it works quite well again right now.
Looks like another bot hosted on Heroku is suffering the same issue.
Chronographer also needed %s/git:/https:/g
and a manual push. I seems like there's two distinct problems here:
1) Having git://
leftovers in some apps makes Heroku get stuck and eventually time out. Having this fixed and manually redeployed also makes heroku run --app patchback bash
like 4x faster without that I/O wait.
2) Heroku's integration with GitHub is somehow broken since it still errors out with an Internal Server Error when I try to do anything about it.
Oh... Apparently, they've just disabled the GH integration until they're done investigating (emphasis mine):
Update
Based on current progress, we plan to complete our investigation by May 30, 2022. We are continuing with remediation activities and plan to publish additional information about the incident once it’s resolved.
At this time we have seen no additional OAuth tokens compromised beyond what was reported on April 15, 2022. GitHub has contacted all customers they identified as affected by the issue. Heroku completed the necessary password resets on May 5, 2022. We have no evidence of any unauthorized access to Heroku systems since April 14, 2022. In the event we notify customers directly, the email communication will be sent from Salesforce (techcomms@mail.salesforce.com).
We know you are waiting for us to re-enable our integration with GitHub, and we've committed to you that we will only do so following a security review. We will post more information to status.heroku.com when it is available.
Posted 21 hours ago, May 18, 2022 00:38 UTC
(https://status.heroku.com/incidents/2413)
Closing this issue, then. The workaround has been found, triggering deploys manually is good enough for me for now. Maybe at some point I'll get to integrating this into GHA but it's not a top priority rn.
The bot stopped working. It still worked on Saturday IIRC, but since Monday it definitely no longer works.