yoshidan / google-cloud-rust

Google Cloud Client Libraries for Rust.
MIT License
216 stars 81 forks source link

[Cloud Storage] Unable to authenticate from Google Kubernetes Engine #219

Closed cetceeve closed 4 months ago

cetceeve commented 7 months ago

Hi, I am trying to upload to google cloud storage. From a local environment running in docker this works just fine with authenticated-user credentials. However, when I deployed into google Kubernetes engine I was unable to get the authentication to work.

let config = ClientConfig::default().with_auth().await.unwrap();

The program seems to just stall at .with_auth(). No panic, it seems to just wait forever. I tried using my authenticated-user credentials that I was using locally both using .with_auth() as well as .with_credentials(json) (using this setup). I also tried to use workload identity according to the GKE documentation here and verified that the workload identity was setup correctly. Both times it keeps stalling there forever.

Any ideas on what I might have missed? If this is a bug in the code, I am happy to contribute to a fix. Since there are no errors I didn't really have any starting point for where to look for the problem, though.

Anyway, thanks for your work on this library!

For the moment, we are just running it from pure docker environment and it's great. Here is the complete code just in case.

yoshidan commented 7 months ago

Is the metadata server 169.254.169.254 reachable from the your GKE container?

If there is no response, then communication may be in progress. Timeout is specified, but connect_timeout is unspecified, so it may be waiting indefinitely for an unconnected IP.

https://github.com/yoshidan/google-cloud-rust/blob/main/foundation/auth/src/token_source/compute_token_source.rs#L40

https://github.com/yoshidan/google-cloud-rust/blob/main/foundation/auth/src/token_source/mod.rs#L26

Outputting the trace log might tell you something; you can use tracing_subscriber to output the log.


#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    tracing_subscriber::fmt().with_max_level(Level::INFO).init();

     // your application code
}
nicolas-vivot commented 5 months ago

The feedback has nothing to do with your original question but i think it is worth mentioning it. Looking at your code, you are creating the GCS client inside the spawn task. This will have for effect to create a complete new client every single time, which is inneficient and may create issue as the load on your component increase. (especially if you use horizontal scaling)

Long story short, a new GCS client will systematically do the following things:

What i recommend you is to:

This way you will avoid doing unnecessary things, and rely on potential internal caching of credentials, connection pool re-usability, faster spawn tasks, etc.

nicolas-vivot commented 5 months ago

@cetceeve I am currently using this library inside GKE with WI without any issues.

Your code seems correct, so i believe the issue here is some mis-configuration of your cluster or environment.

If you share your Kubernetes manifest i can potentially help you to debug it.

A couple of questions:

cetceeve commented 4 months ago

Thank you so much to both @yoshidan @nicolas-vivot ! 😊 All comments are 100% valid and I agree that it was probably a misconfiguration issue. It was an autoscaling spot pricing cluster but unfortunately, I cannot report anything further since we have decided to move away from GCS as we couldn't cover the cost.

I will close the issue but the information here might be useful for other developers, too.