kinvolk / lokomotive

🪦 DISCONTINUED Further Lokomotive development has been discontinued. Lokomotive is a 100% open-source, easy to use and secure Kubernetes distribution from the volks at Kinvolk
https://kinvolk.io/lokomotive-kubernetes/
Apache License 2.0
320 stars 49 forks source link

Autodiscover Packet private CIDR #246

Open johananl opened 4 years ago

johananl commented 4 years ago

https://github.com/kinvolk/lokomotive/blob/49e8346d89408856bd54bbe86cc4b2353b0ee81a/examples/packet-production/cluster.lokocfg#L73

Instead of asking the user to figure out what the CIDR means, we can query for it.

As a note, there is an edge case here: IIRC when a brand new Packet project is created and before the first device had been deployed, there is no Packet private CIDR since it's allocated lazily by the Packet IPAM service. We may want to consider this when working on this change.

surajssd commented 4 years ago

Isn't it something that Packet CCM should solve and any kubernetes manifest can be populated from there?

johananl commented 4 years ago

Where would the CCM store this information? We have to remember that the CCM is a generic component, i.e. it shouldn't know or care about Lokomotive.

rata commented 4 years ago

I dont think the CCM is related to that, at all. It will just add labels/annotations to nodes, right?

As a side note, Packet adds a rule like this on every VM (tested ubuntu and flatcar), so internal IP is used:

$ ip r
...
10.0.0.0/8 via 10.80.180.8 dev bond0 proto static 

So, they already rely on 10.0.0.0/8 being used for the internal network and don't bother to set the node_cidr. I believe this was the case before this new Packet feature too.

Using that, should fix the issue without autodiscovery.

However, we might want to restrict it to the smaller CIDR possible and that is fine too. Just wanted to point out the other alternative, just in case.

johananl commented 4 years ago

AFAIK we use the private CIDR mainly for filtering inter-node traffic using GNPs, not for routing.

rata commented 4 years ago

Yes, my point was to note that using 10.0.0.0/8 for node_cidr will work and is something packet relies on today. We don't necessarily need to auto-discover it. I'm not saying we should use this, just to have all the options on the table to make a decision.

surajssd commented 4 years ago

We can list private_ipv4 for a packet project. This wasn't the case always, was it?

https://www.terraform.io/docs/providers/packet/d/ip_block_ranges.html#attributes-reference

invidian commented 4 years ago

Also, if we're unable to find private CIDR during cluster creation, we could start with 10.0.0.0/8 and then update it to lock it further.

rata commented 4 years ago

We can list private_ipv4 for a packet project. This wasn't the case always, was it?

https://www.terraform.io/docs/providers/packet/d/ip_block_ranges.html#attributes-reference

I think it was there, the "tricky thing" (that can be solved with some terraform dummy resources) is that it doesn't exist when no server is deployed or if you delete all servers "for a while" (packet might change it the next time you create a server if no server is currently running).

There even was an attempt to use it, that didn't work: https://github.com/kinvolk/lokomotive-kubernetes/pull/43

johananl commented 4 years ago

We can list private_ipv4 for a packet project. This wasn't the case always, was it? https://www.terraform.io/docs/providers/packet/d/ip_block_ranges.html#attributes-reference

I think it was there, the "tricky thing" (that can be solved with some terraform dummy resources) is that it doesn't exist when no server is deployed or if you delete all servers "for a while" (packet might change it the next time you create a server if no server is currently running).

There even was an attempt to use it, that didn't work: kinvolk/lokomotive-kubernetes#43

Right. So in that case it makes sense to me to do what @invidian has suggested:

rata commented 4 years ago

I agree something like that makes sense. But I'd go simpler: always use 10.0.0.0/8 and afterwards always replace with the allocated CIDR. This way there is only one path (instead of two: CIDR might be used at the cluster creation if it was known, or updated later if it was not known at cluster creation).

That way, we will avoid hitting issues when the CIDR is created or not, and clusters broken in non-obvious ways because one path was less tested than the other. For example, if we go with the approach that has two paths, the CI will always test the path where the CIDR exists right now (as there is the CI server already, therefore an CIDR allocated). We would lack testing in the CI about the other path. And if going that other path fails, it is not be detected by the CI and can be difficult to debug when we hit it on real usage.

If we just do the simpler way (always use 10.0.0.0/8 and always update the CIDR after deployment), we always will use the same path and the CI will be testing that path exactly (no need to do complicated things to have the CI test that).

What do you think?

invidian commented 4 years ago

@rata idea sounds good to me as the default behavior. One thing I would keep in mind is, that it would be nice to do all of that in one run, so there is no need to run cluster apply twice. Of course all that should only happen, if user didn't specify the CIDR manually.

rata commented 4 years ago

Not sure I agree. The point of the issue is to NOT ask users this. My understanding was to remove that option.

johananl commented 4 years ago

My main point when opening this issue was indeed that we shouldn't force the user to worry about the Packet private CIDR if all they care about is getting a working cluster.

As for whether we automate the CIDR "discovery" by default or remove the option altogether - I'm not sure. I do see value in allowing the user to narrow down the inter-node traffic filtering to something more restrictive than the entire project CIDR. I don't see this option as very practical since users can't properly encapsulate the notion of "all nodes belonging to this cluster" in a CIDR, but I also don't see a strong reason to remove this knob.

Again, my main point was - if we can figure out the low-level Packet details on behalf of the user, we should do it, at least by default.

rata commented 4 years ago

I do want to make a point to remove options, even optional, if they are not useful for the user :-D. Like in this case: it will be autocompleted by lokomotive and there is no usable way to tune it to any other value for now. I strongly think we should remove options that there is no point in changing them, and only add them when a valid use case is known.

Sorry for the long thread, we can maybe make a summary in the original description :)

johananl commented 4 years ago

As I've recently learned, it's possible to have multiple private CIDRs on Packet for the same project and facility. This can make things more complicated in the context of this issue.