hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.85k stars 1.95k forks source link

client.network_interface configuration is not respected when running in AWS or Azure #11069

Open Davasny opened 3 years ago

Davasny commented 3 years ago

Nomad version

Output from nomad version

Nomad v1.1.3 (8c0c8140997329136971e66e4c2337dfcf932692)

Operating system and Environment details

Instance s1-2 in OVH Cloud - WAW DC

# uname -a
Ubuntu 20.04 LTS
Linux worker0 5.4.0-73-generic #82-Ubuntu SMP Wed Apr 14 17:39:42 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Issue

Nomad does not respect client.network_interface configuation when running in AWS or Azure

Reproduction steps

Check the code https://github.com/hashicorp/nomad/blob/1403a06b99b07b6ba9dc5b184005aa8bed316d39/client/fingerprint/env_aws.go#L138 https://github.com/hashicorp/nomad/blob/1403a06b99b07b6ba9dc5b184005aa8bed316d39/client/fingerprint/env_azure.go#L174

I found this issue when I installed nomad in OVH Cloud (WAW DC) on instance s1-2 and nomad detected provider as AWS. After few hours of debuggin, I found out OVH in this DC has running service on address 169.254.169.254 that returns AWS data (reported issue to OVH, their internal ticket 2847070). Also it turns out that nomad will change unique.network.ip-address to address found in this service, that caused missconfig in my case.

Expected Result

unique.network.ip-address is set with value defined in client.network_interface configuation

Actual Result

unique.network.ip-address is overwritten with value found in cloud provider address

Fix proposals:

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

DerekStrickland commented 3 years ago

Hi @Davasny,

Thanks for using Nomad! We're reviewing your issue as a team, and in the hopes of avoiding assumptions, we were wondering if you could share more about your use case. Could you describe the scenario where this occurs with any extra detail? Sometimes knowing why you work a certain way can really help us come up with the best solution.

Thanks again. We are truly grateful that you took the time to file this issue for the benefit of the community.

@DerekStrickland and the Nomad team

Davasny commented 3 years ago

Hi @DerekStrickland, thanks for taking a look at my issue, I'll try to make better background for this task.

I'm using OVH Cloud as vps provider, they use Openstack with EC2 compatibility in which mode there is "Metadata service" running at address 169.254.169.254 - https://docs.openstack.org/nova/rocky/user/metadata-service.html

In this mode, service at 169.254.169.254 will return ec2-like informations and nomad will think "I'm running on EC2 in AWS" and will go into aws network fingerprinting. In this mode nomad sets unique.network.ip-address to address received from 169.254.169.254 service which is wrong because it points to public address but I set config to use private interface (client.network_interface). In OVH Cloud you can use vRack which is equivalent of aws subnets.

My goal is to use private (internal, isolated network) address as unique.network.ip-address and not public address, beacuse I need this adress in config template of some internal processes

I think nomad should always respect client.network_interface even when running in aws or azure clouds, but if value is not set then use value provided by 169.254.169.254.

Feel free to ask more questsions, I can also provide you some test infrastructure in OVH Cloud if it's necessary. Nomad is realy good tool for me (I switched from k8s ;) and using reject rule in firewall for this 169... service is very ugly solution.

DerekStrickland commented 3 years ago

Thanks @Davasny!

We'll review this as a team and I'll let you know if we have more questions.

Davasny commented 3 years ago

@DerekStrickland @mikenomitch any update on this?

Davasny commented 2 years ago

@DerekStrickland @mikenomitch any update?

DerekStrickland commented 2 years ago

Hi @Davasny,

My sincerest apologies for the delay!

I've done some investigation into this code, and it turns out that it was a community contribution from 4 years ago, so I wasn't able to go to the author for input.

That said, I'm curious about your thoughts on the best solution of the three you proposed. Here are my thoughts.

add information in documentation "it's not a bug, it's a feature"

After looking at the code here and in other projects, it doesn't seem like this code is necessarily wrong, but rather that you need a way to override it. Others might also want this too. Do you have a work around that works for your use case? If so, it would be nice to include in the documentation. If there is a way to achieve the behavior you want today, I'd prefer to just document that.

don't overwrite this filed

I suspect that would break other users that rely on this behavior.

add new field in client config to let us using different interface during finger print

This is likely doable, but not a trivial amount of work. If you don't have a workaround that you think is reasonable, and given that you have first-hand experience with the issue, and know how to reproduce it in your infrastructure, would you be interested in submitting a PR to add the behavior you expect?