Closed tmatsuo closed 4 years ago
@bcoe did we get the no-Cloud-Tasks-for-Build-Cop change rolled out?
@tbpg yes, build cop should not be using cloud tasks right now.
On one hand, this means we should be able to handle large payloads, on the other hand it makes it slightly harder to see failures.
Looking at the logs, the payload seems to be missing when it gets to the buildcop Cloud Function. There are a lot of errors like this:
[undefined/undefined] No xunitXML and no testsFailed! Skipping.
Perhaps this should be an error so we catch this in the future?
The last successful logs were the evening of July 21. .
2448b1c62ad4d8f3ab80776e039a92af811bb6c7 and e381a7dbc4e2bdb3b3a3dae46d312cf2ed150385 were the two commits on July 21. But, they don't seem related? The gcf-utils update didn't update buildcop. The logging change only changed gcf-utils (and not the version of gcf-utils that other packages depend on).
Could it be that the scheduler/pubsub payload never had the repo/owner in it and was being injected by handleScheduled()
(before we switched off background tasks): https://github.com/googleapis/repo-automation-bots/blob/cc3b7bccda259cf10bbba44f245aa9534892cffe/packages/gcf-utils/src/gcf-utils.ts#L455-L469
we could test this by either checking the messages in the pub/sub queue in GCP (I don't see any messages in there right now) or we could log the payload just before this line: https://github.com/googleapis/repo-automation-bots/blob/cc3b7bccda259cf10bbba44f245aa9534892cffe/packages/gcf-utils/src/gcf-utils.ts#L357
Ok I think I know what's wrong here. We originally had buildcop
using Task queues to handle tasks in the background and so when we received a payload like this from Pub/Sub:
{
"name": "buildcop",
"type": "function",
"location": "us-central1",
"installation": {
"id": "5943459"
},
"repo": "GoogleCloudPlatform/ruby-docs-samples",
"commit": "bc55d50b4acf0f4144e94b2a77c34cba038be49f",
"buildURL": "[Build Status](https://source.cloud.google.com/results/invocations/cbf6df00-2459-40e6-86d0-3574d4d87e6c), [Sponge](http://sponge2/cbf6df00-2459-40e6-86d0-3574d4d87e6c)",
"xunitXML": "PD94bW...[edited]...Cg=="
}
We would then pass this payload to handleScheduled()
in gcf-utils.ts
: https://github.com/googleapis/repo-automation-bots/blob/4f928b100586a3c2d69acb846e6081c0b9f03038/packages/gcf-utils/src/gcf-utils.ts#L358-L360
handledScheduled()
would then call scheduledToTask()
with this payload:
https://github.com/googleapis/repo-automation-bots/blob/4f928b100586a3c2d69acb846e6081c0b9f03038/packages/gcf-utils/src/gcf-utils.ts#L455-L458
scheduledToTask()
would then transform this payload to add the repository
object: https://github.com/googleapis/repo-automation-bots/blob/4f928b100586a3c2d69acb846e6081c0b9f03038/packages/gcf-utils/src/gcf-utils.ts#L501-L509
So when we stopped using background tasks by adding this flag we completely skipped this transformation process: https://github.com/googleapis/repo-automation-bots/blob/4f928b100586a3c2d69acb846e6081c0b9f03038/packages/buildcop/src/app.ts#L20
Buildcop is now looking for body.repository.name
and body.repository.login
but that doesn't exist because the repository
doesn't exist: https://github.com/googleapis/repo-automation-bots/blob/cc3b7bccda259cf10bbba44f245aa9534892cffe/packages/buildcop/src/buildcop.ts#L113-L117
Solution:
Option 1 - we change buildcop to look at body.repo
instead
Option 2 - we modify gcf-utils to do the transformation and add the repository
key even when background: false
Happy to make a PR with the fix if we decide on which option to go with...I would think option 2 would be the better choice.
Thank you for the fix! Reopening because the bot is still not working. Let's wait until we get the gcf-utils upgrade through and we see successful logs.
To be clear, I'm good to change the buildcop code to get the owner/repo from wherever.
No worries! I'm waiting on @bcoe to merge this PR https://github.com/googleapis/repo-automation-bots/pull/762 (I believe he is testing a new release framework) before updating buildcop's gcf-utils version.
I think making the change within gcf-utils is the better way to go since buildcop won't be the only bot that will want to parse the owner/repo name from the GitHub payload. There was also some confusing flows for scheduler payloads vs. pubsub payloads that needed fixing, so this was a good opportunity!
Confirmed via logs that issue is now fixed
Thank you!
This morning we had some failures, but we didn't get reports from build cop bot. This time, the logs should be small, so I think it's another failure, so I'm filing a new bug.
Python 3.6 build
Excerpts from the log:
You can download the test.xml file from the artifact page
These are also builds with the same failure: Python 3.7 build Python 3.8 build
Python 3.8 build also has another failure which we also didn't get the report: