Open gdestuynder opened 9 years ago
@richardweiss
We're not sure if the mitigation was meant to this risk (might have been a mis-paste). If it is, we may need more information/meet about it as we don't understand it.
The consul acls in this case should ensure that by using separate client certs (x509) each service type can only register itself as the correct service. This prevents a rogue service from registering itself as another service.
First of all, this issue is misunderstood quite a bit.
Consul does not do ACLs based on x509 certs, only it's own internal ACL token system (https://consul.io/docs/internals/acl.html)
And our mitigation was to move apps to their own accounts. At that point, Consul is only accessible from inside the VPC and is separate from all other accounts.
A rogue service, resulting from a compromise of some sort, can indeed register as another service, but that's a small issue compared to the damage they could cause by having gained privileged access to the compromised system already.
It's a planned feature nontheless to protect services like fluentd, but not on the current plan-b roadmap
registering as another service means that you compromise more parts the whole "app" (ie all services inside the "app"/VPC serving the app) since you can impersonate them. Controlling this ensures the attacker cannot spread the attacker ("security in depth").
I understand that this is not part of your own roadmap, this is just the tracker for the issue since it's still something we'd generally want to do regardless. It's perhaps akin to spoofing an IP in a regular data-center (that's very rough analogy).
The gossip communication model does support x509 client side certificates to authenticate whom can talk to whom but not which service is allowed - that's a mistake on our part.
Using the gossip token per service seem like the way to go / the way you want to implement this in the future.
@marianpiper as per gozer comment, it looks like this is one risk you plan to have accepted for "plan b" roadmap, just FYI
The solution is implementing ACL tokens. Part of the plumbing is already implemented in cloudformation. However doing the properly is going to require a lot of moving parts. There are a number of components, including:
From Risk Record
Missing Consul service ACLs allows for any host to impersonate services within a product VPC
In the absence of Consul ACLs governing the ability of a host to assert that it is providing a given service via the service discovery mechanism, a compromised host could impersonate an uncompromised host by claiming to provide its service. In doing so the compromised host could obtain data not intended for it.
Consul's gossip network is authenticated insecurely and is susceptible to any compromised machine from any product (root is not needed) impersonating any consul load balanced service
Recommendation:
Establish Consul ACLs and rules to govern Services
Mitigation from Nubis team
https://mana.mozilla.org/wiki/display/EA/Mozilla+IT+Application+Migration+to+AWS+-+short+term+solution
This will be mitigated through the per-application account model change - the risk leftover is the same as apps having direct access to the databases (ie: no middleware). As none of the first 4 apps have middleware we are foregoing work on this today as the vector would be the same.