microsoft / go

The Microsoft build of the Go toolset
BSD 3-Clause "New" or "Revised" License
281 stars 27 forks source link

AzDO pipeline polling error: "invalid character '<' looking for beginning of value" #1035

Open dagood opened 1 year ago

dagood commented 1 year ago

While polling a build for completion, a call got this error, making the pipeline fail:

invalid character '<' looking for beginning of value

https://dev.azure.com/dnceng/internal/_build/results?buildId=2261106&view=logs&j=8d802004-fbbb-5f17-b73e-f23de0c1dec8&t=995a157e-87d5-56c6-b42a-d8b689a8a0cd&l=409

In my experience this happens when the API returns HTML instead of the expected JSON. Authentication is an easy way to hit it, but maybe it's rate limiting or some internal error?

At the very least, we should try to change the code to show the whole response when this happens so we can investigate. I'm not sure how deep in the AzDO library this would need to be done. We should also see if there are newer versions of the module available that might address this.

dagood commented 1 year ago

https://dev.azure.com/dnceng/internal/_build/results?buildId=2263023&view=logs&j=307c05fb-395e-5cff-ceb4-9869362bab1d&s=f6031391-65b9-5416-93f9-6593f6c32fa6&t=c21f8ad3-d0c9-57f6-6f87-8d3464674117&l=56

Excerpt:

        Azure DevOps Services Unavailable

            Azure DevOps Services

                Sorry! Our services aren't available right now.
                We're working to restore all services as quickly as possible. Please check back soon.
                To see the latest status on our services, please visit our support page.

I don't see anything related in https://status.dev.azure.com/_1es/_history. The fact that it failed just once across a lot of jobs makes me think we're starting to hit flakiness in a few APIs that used to be more solid.

dagood commented 1 year ago

Filed https://portal.microsofticm.com/imp/v3/incidents/details/421846442/home for the outage response.

dagood commented 1 year ago

There was no clear diagnosis at the AzDO level in the IcM ticket (potentially network instability somewhere deeper), and they would like the IcM reopened if we hit it again.

dagood commented 2 months ago

Hit it again at https://dev.azure.com/dnceng/internal/_build/results?buildId=2514553&view=logs&j=19992227-62fb-5b50-4e29-3b72bd33eea1&t=f705825d-763d-59e2-9b89-e307dddf2544&l=1969 during 1.23.0-1 release.

dagood commented 2 months ago

We hit it during a polling operation this time, which doesn't output the full details, so we don't have e.g. a reference number. I don't think it's worth going through the process to "reopen". (There is no reopen button, so we'd also need to submit a new ticket in some way.)

We should make our polling code a little more robust and print out the full bad output.

Given this has never been a blocking issue, just an interruption, the value isn't all that high there either.