Open vit-zikmund opened 1 year ago
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Hi there, just got out of one nice rabbit-hole and thought I won't keep the journey for myself :)
TL;DR
Now I know attaching a disk in one project as a
PersistentVolume
in a GKE in another project is not allowed, but the driver gave me a real hard time on my way to realize that - which is what I want to improve for any other unfortunate dead-end journeymen.The rabbit hole
Trust me, it was a well meant design decision. We decided to keep some persistent disks in their own project (let's call it _DISKPROJECT) out of a scope of an ephemeral project hosting a GKE cluster (_GKEPROJECT). So far I haven't tripped any warning that it's not really supposed to work; all there is speaks about that one can't attach disks from different zones, but that was no problem. We set it up per the instructions in the GKE docs. All seemed fine until we wanted to bind the PV/PVC to a workload when things started sliding the wrong direction without us knowing yet (the logs are from the workload's namespace
events
).Treading that path for the 1st time, it wasn't particularly easy to figure out that the permissions should be added to the "hidden" engine robot account, but we cracked that one.
Curious and cautious about what all the permissions will be required we naturally hit a couple new ones right after we added the previous ones, notably:
Something's already off in this one - it's mentioning the _GKENODE in the _DISKPROJECT, while the node is actually in its own "_GKEPROJECT". But we didn't notice, so out of a little desperation we added the _DISKPROJECT's
compute.admin
role to the _GKEPROJECT's robot account, at which moment the "403 Forbidden" errors disappeared, but the most strange (while now more obvious) thing happened. The error became a "404 Not Found", because the GCP API couldn't find the node, that didn't exist in the _DISKPROJECT, but existed in the _GKEPROJECT:That was a dead end and at this moment I grew suspicious the driver (
v1.8.7
in our case, butmaster
seems to have it too) "incorrectly" derives the node's project from the disk's one. Then I really found the code that extracts the project from the disk'svolumeHandle
and uses it in the call toAttachDisk
which eventually leads to the GCP API call that produces the error. Ha!Before filing a bug for this "obviously" incorrect assumption, I tried to bypass the driver to see if I'll have more luck directly with the API's
attachDisk
(redacted):Finally it gave out the error I was missing the whole time:
Whoops, OK then!
Next steps
Given the level the driver wants to enforce that rule itself and not delegate it to the API, there are likely two scenarios how to cope with a fix.
validateControllerPublishVolumeRequest
instanceProject
argument (or whatever) so it would get passed correctly to the API so the user would get the final error.Either way, now the driver hides the last most important API error behind its silent assumption, which is hopefully worth tackling.
Thanks for reading this novel up to here :tada: :smile: