upbound / up

The @upbound CLI
Apache License 2.0
50 stars 42 forks source link

fix: exporter retrying on transient errors #460

Closed phisco closed 3 months ago

phisco commented 3 months ago

Description of your changes

Sometimes while resources are being created or deleted, we could hit the following errors:

cannot export resources for "defaultsecuritygroups.ec2.aws.upbound.io": cannot fetch resources: cannot list "defaultsecuritygroups.ec2.aws.upbound.io" resources: the server could not find the requested resource

Or:

cannot get GVR for "dnstxtrecords.network.azure.upbound.io": cannot get REST mapping for "dnstxtrecords.network.azure.upbound.io": no matches for kind "DNSTXTRecord" in version "network.azure.upbound.io/v1beta1"

Or:

cannot fetch CRDs: cannot list CRDs: an error on the server ("dial tcp 10.96.125.164:443: connect: connection refused") has prevented the request from succeeding (get customresourcedefinitions.apiextensions.k8s.io)

Or connection refused errors which we should retry instead:

cannot fetch resources: cannot list "flowlogs.ec2.aws.upbound.io" resources: an error on the server ("dial tcp 10.96.170.58:443: connect: connection refused") has prevented the request from succeeding

The errors are transient and should not block exports; therefore, we should simply ignore them.

I have:

How has this code been tested

It's a transient error so, it's hard to reproduce consistently, happens quite often in CI with parallel tests though.