oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
244 stars 36 forks source link

endpoints for firewall rules #623

Open jessfraz opened 2 years ago

jessfraz commented 2 years ago
rmustacc commented 2 years ago

Hey @jessfraz, at least from an API perspective our initial view of firewall rules was as one larger object. This was done mostly from the perspective of allowing atomic updates on the atomic rule sets, that's why there isn't a common POST/DELETE. The goal here was to allow the use of etgas in a way that you couldn't add/delete a rule without context of the surrounding ones. Phrased differently the current firewall as a whole is a single object.

If useful, I can dig up a bunch of the past discussions on this for context.

jessfraz commented 2 years ago

As an end user, that is going to be super annoying. I don’t think I’ve seen people do it that way? But maybe I am mistaken. Honestly its like we re-created the shittiness of iptables in api form.

On Jan 23, 2022, at 4:37 PM, Robert Mustacchi @.***> wrote:

Hey @jessfraz https://github.com/jessfraz, at least from an API perspective our initial view of firewall rules was as one larger object https://rfd.shared.oxide.computer/rfd/0021#_vpc_firewall_apis. This was done mostly from the perspective of allowing atomic updates on the atomic rule sets, that's why there isn't a common POST/DELETE. The goal here was to allow the use of etgas in a way that you couldn't add/delete a rule without context of the surrounding ones. Phrased differently the current firewall as a whole is a single object.

If useful, I can dig up a bunch of the past discussions on this for context.

— Reply to this email directly, view it on GitHub https://github.com/oxidecomputer/omicron/issues/623#issuecomment-1019609358, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALA23CQU3N3ROGEYWC4KM3UXSNMLANCNFSM5MT5KFOA. You are receiving this because you were mentioned.

jessfraz commented 2 years ago

so for example someone using the api that just wants to add one rule:

they need to get them all first then add their new one, so if I make a cli command to add a rule I need to get them all first then add it, why wouldn't we just make the endpoints so its not the worst

rmustacc commented 2 years ago

Just for context, the array of rules in one endpoint is how AWS and GCP work. We had a discussion around this for routes and firewalls in this control plane huddle. The tradeoff with the approach you're describing is the inability to perform an atomic update of the rule set. I get that from some ergonomics in the client, that makes the above worse. But aren't most updates going to be doing conditional PUTs to make sure resources don't change out from under them in general, which requires fetching the resource to begin with?

Nothing is set in stone, but the goal was to try and address a class of issues some of us have encountered in the past -- not to create the worst API ever. Please give us the benefit of the doubt. Obviously without having end-to-end working consumers nothing we've done is perfect and we're going to learn a lot more from this. Thanks for raising this.

jessfraz commented 2 years ago

Seems like you can still add a single rule tho (at least on GCP):

https://cloud.google.com/compute/docs/reference/rest/v1/firewallPolicies/addRule

davepacheco commented 2 years ago

they need to get them all first then add their new one, so if I make a cli command to add a rule I need to get them all first then add it, why wouldn't we just make the endpoints so its not the worst

If the CLI doesn't get them all first, is it possible to avoid the dueling administrator problem? That seems somewhat high stakes here, since it would seem pretty easy for an unexpected combination of rules to disrupt connectivity and trigger a major incident for the customer.

Is there some scalability problem you're worried about or is it just the code required to fetch the rules first?

jessfraz commented 2 years ago

Its annoying to have to fetch them all first, as someone who usually just adds a rule or two, I like that gcloud has an API for adding one, which then allows them to have a cli command for adding one. But if the cli is just fetching them all and adding one, why wouldn't we just make an endpoint to do that exact same thing. Also you could say the same thing you are saying about API endpoint couldn't you? Like deploying a VM, how do you prevent two from being created at the same time, I just don't really understand the problem you are solving with this but maybe I am obtuse

davepacheco commented 2 years ago

Its annoying to have to fetch them all first, as someone who usually just adds a rule or two, I like that gcloud has an API for adding one, which then allows them to have a cli command for adding one.

I get that there's a potential cost to adoption if we're expecting customers to build directly atop our API and they have to write extra code for what should be an easy operation. I don't want to minimize that. But I do want to be clear about the problem: I think we're just talking about the extra code required to build a CLI add-rule command without an API that just does it for you. This doesn't affect the user of the CLI as far as I understand.

Also you could say the same thing you are saying about API endpoint couldn't you? Like deploying a VM, how do you prevent two from being created at the same time, I just don't really understand the problem you are solving with this but maybe I am obtuse

Yes, the same concern applies to most non-create endpoints. But a user can avoid clobbering somebody else's changes by using HTTP conditional requests with an ETag (once that's implemented). That doesn't work here because if you have the ETag, then you already fetched the rules, so you wouldn't need the "addRule" API. (You specifically asked about "create", which is different. The way we're doing creates with POST, it's hard to have a conditional request. That said, I don't think it's common for two create operations to interfere with each other in quite the same way, where both requests succeed but the resulting system is broken in a way that neither administrator could have anticipated.)

I'm not arguing that we shouldn't have an add-rule API because of the dueling administrators problem. Rather, I'd argue that if we're only going to build one API, it should be one that allows users to deal with this problem if they want to because my experience is that it's essential in order to build robust automation atop an API. Put differently: a customer can always build add-rule on top of what we're doing, even if it's a bit of work. They cannot build a firewall modification mechanism atop add-rule/remove-rule that's robust to concurrent changes, which means we're forcing those users to build some lock outside of our system.

If we accept that we need the endpoint that lets users address this problem, then I'd also argue for API minimalism: if something can be done on the client with no cost in terms of correctness, flexibility, scalability, or security, let's do that rather than add more API surface area. This might be at odds with adoption considerations and maybe that's a reason to build convenience APIs.

jessfraz commented 2 years ago

so then why not do both, keep the endpoints you already have and add the endpoints for single rule adding for the lazy users. not everyone has n-teen hours in a day for just writing bits of code to account for our API not having said features, we should make our API usable by anyone out the box