Closed pierreprinetti closed 6 months ago
we try to read const instanceIDFile = "/var/lib/cloud/data/instance-id
if metadata service got some problem and we think local might change
https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/openstack/instances.go#L514
but seems we should honor local value then try metadata and store the metadata result as cache (as proposed above) ..?
I think we can close this because we're already caching it: https://github.com/openshift/cloud-provider-openstack/blob/0eacab836b290551a6e058f7c6adb55747f1b591/pkg/util/metadata/metadata.go#L259-L260
So whether we're fetching from config drive or metadata, both GetInstanceID and GetAvailabilityZone will fetch the full metadata struct from the cache and therefore only ever hit the metadata service at most once. As both of these are static[^1] this seems fine.
[^1]: InstanceID can only be static. AZ can technically change, but only in a forced live migration by an OpenStack admin with an explicit destination host which violates the server's AZ constraint. This isn't a recommended way to do live migrations.
GetDevicePath doesn't use the cache, which is also correct. Fetching this every time is unavoidable.
There is a subtle problem with this code, though: setting the cache itself is racy. The cache is set in code which is called by the gRPC server, which is multi-threaded. 2 threads can obviously race to set the cache initially, but while that's a bit inefficient it's not actually a problem as they should both fetch and therefore cache the same data. However, setting a pointer in Go isn't an atomic operation, which means that reading the cache pointer and setting the cache pointer currently have undefined behaviour. This is probably the least problematic category of race conditions, but if we want to be 100% safe we should probably fix it with either a mutex or an atomic load and store.
@pierreprinetti close this and create a new issue for the race?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
As discussed in #2217, calls to the Nova Metadata service could probably be reduced when the only goal is to retrieve the instance's ID.
Since the ID is never going to change, it could be fetched once and persisted in memory.
If there is consensus on a need for action on this one, I'd be glad to take it.
CC @zetaab @mdbooth @jichenjc