input-output-hk / mithril

Stake-based threshold multi-signatures protocol
https://mithril.network
Apache License 2.0
122 stars 37 forks source link

Security Issues of the Mithril Software Architecture #1487

Closed onyxstakepool closed 6 months ago

onyxstakepool commented 7 months ago

Why

The current Mithril network architecture is susceptible to various types of denial-of-sevice attacks of the block producer nodes that run the Mithril signer. The goal of the Mithril network project to interact with the stake weighted majority of block producers makes this an even more pressing issue. The close interaction of the Mithril signer with the cardano-node database and the access to secret keys and certificates as outlined in the Architecture overview creates a myriad of security issues.

The design decision that a single server, the Mithril aggregator, is interacting with the most important Cardano nodes on the network, the stake-weighted majority of the block producers, is a huge security risk. Even an inadvert bug in the Mithril software could disrupt the majority of the block producers and lead to a catastrophic network failiure of the Cardano network.

Having such a single point of failure risk on the Cardano network is not acceptable.

It also puts an enormous trust burdon on a piece of software that is maintained outside of the main cardano-node repository.

What

A reasonable approach to mitigate the main security issues above is to integrate the Mithril signer code into the cardano-node software and use the establisched networking infrastructure to relay the required Mithril information over the Cardano P2P network. The Mithril aggregator can then listen for specific Mithril messages on the Cardano P2P network without the need to penetrate the firewalls and infrastructure of the stake pool operators.

abailly-iohk commented 7 months ago

@onyxstakepool Thanks for raising this issue.

We think the actual picture is less bleak than the one you are painting here, but it's clear we are lacking good resources on the Mithril network threat model and an updated description of the architecture. We are planning to address your concerns promptly, eg. in the next few days (Work on ssue #1350 has been stalled but need to be moved forward).

In order to help us provide the best possible answer, could you please provide some details on the "myriad of security issues" you are seeing here (taking into consideration our short discussion on the #moria channel)?

onyxstakepool commented 7 months ago

@abailly-iohk Thanks for your respone.

After learning in the Discord Moria channel that it is not technically necessary to run the Mithril signer on a block producer, I am much less concerned now about the security implications. Since a simple relay server is also sufficient for the Mithril signer only the KES key and the node certificate are at risk. Even if these two pieces are leaked from the relay, an attacker would not be able to replicate the block producer since the vrf key is still missing. The KES key and the node certificate can be quickly rotated without any harm to the real block producer.

With this understanding, I suggest that the Mithril documentation is changed to recommend the naive deployment also for mainnet. See: Mithril signer deployment model

My original security concern was that an attacker would spend considerable efforts to compromise the Mithril repository and thus gain access to the block producers and wreck havoc on the whole Cardano network.

abailly-iohk commented 7 months ago

IIRC we updated the suggested deployment model from relay to BP because some SPOs participating in Mithril were concerned about the potential of KES keys leaking when deployed on a relay which is inherently less protected 🤔

Your concern about supply-chain-based attacks is definitely valid and something we are also concerned with and should mitigate, but to be fair this is also something the cardano-node itself is subject to. We will take that into account in the threat model.

jpraynaud commented 7 months ago

@onyxstakepool Thanks for this issue.

Indeed, the new deployment model has been jointly designed with pioneer SPOs input and feedback, and they raised concerns about having the KES keys exposed on a relay. Prior to the mainnet launch, in late June 2023, we have started rolling out this new deployment (as announced in this dev blog post: https://mithril.network/doc/dev-blog/2023/06/28/signer-deployment-models).

As @abailly-iohk already mentioned, we are currently working on a threat model for the Mithril network. This will help us get a better picture of the security related issues, and it will also be an opportunity for the community to contribute and give some feedback.

In the mean time, feel free to share any specific attack scenario that you would already have in mind :+1:

Issue #1488 has also been created to make the architecture diagrams easier to understand.

reqlez commented 7 months ago

Indeed, it was decided that keeping keys on world-facing cardano-node relays was not very acceptable. However, I don't see how an outgoing connection from a mithril-signer to an aggregator is a huge risk, personally.

In my mind, it's no different than you trusting the NTP client app to connect to an NTP server and not "get hacked" in the process.

"creates a myriad of security issues" I would like to see some example attack scenarios if i'm to be convinced. The way I see it, an attack would have to involve the mithril signer software having a serious bug PLUS the aggregator getting hacked at same time, crafting a malicious response to the signer, that could potentially execute some code in some "buffer overflow, etc". I guess theoretically possible? I think the engineers would have a better idea how possible it is to execute something like this.

RE supply chain attacks, no different than people using any of the add-on software that SPOs use. Like... Koios Tools, Scripts, cncli, etc etc. Not to mention, the operating system where you are running your software from, and the huge number of libraries, packages, etc, not having a supply chain attack.

Don't get me wrong, I do support the idea of Mithril just being integrated into the core protocol, but maybe it's a bit too early for this?

onyxstakepool commented 6 months ago

@disassembler just pointed out security issues with squid in the Mithril channel. https://www.cvedetails.com/vulnerability-list/vendor_id-9950/product_id-17766/Squid-cache-Squid.html

abailly-iohk commented 6 months ago

@onyxstakepool For the sake of completeness, you should also have posted @disassembler's answer

Looks like squid had a release recently patching a number of vulnerabilities: https://www.squid-cache.org/Versions/v6/squid-6.8-RELEASENOTES.html This wasn't the case when we made the decision to use traffic. Also traffic has that vulnerability mentioned above patched in 9.2.3 released back in October. Good news is both products when using the latest release have zero CVEs at the moment but I encourage anyone running infrastructure to keep up on tracking CVEs for anything they're running. The ossec mailing list is a good way to get early notifications of vulnerabilities.

and also @jpraynaud's answer later on:

Actually, the Squid vulnerabilities (majority of which are DoS attacks) are not directly applicable to the Mithril usage for the following reasons: Squid is used only by the Mithril signer to forward proxy its HTTPS calls to the Mithril aggregator (containing only public data) Squid is not used to relay any traffic coming from outside of the SPO infrastructure, and it is not caching any data The firewall and Squid configurations that we recommend enforce that only the Mithril signer can have its calls relayed

Squid is a piece of software and like all pieces of software can have vulnerabilities. Deploying and using any software for business critical missions require constant monitoring and attention to security and performance issues which, fortunately for the case of squid, are public and promptly fixed.

abailly-iohk commented 6 months ago

BTW, seems like Traffic is also subject to security exploits: https://www.cvedetails.com/vulnerability-list/vendor_id-45/product_id-19990/Apache-Traffic-Server.html

onyxstakepool commented 6 months ago

@abailly-iohk Thanks for addressing all the concerns above.

Let me explain here why I keep voicing concerns about how Mithril is set up.

First, as soon as you open a door (port) to your server, you are in the business of defending this door. So, for critical infrastructure you just do not open these doors unless absolutely necessary.

For the cardano-node I am quite confident that every bit that flows in and out of the server is meticulously processed and checked.

For the Mithril signer, relay, and aggregator the whole security concept looks much more ad hoc. 1) I must run the signer with the same user as the cardano-node. 2) The signer must be granted unrestricted access to the node database. 3) The signer needs access to secret keys. 4) The relay (squid) adds more complexity to keep everything secure. 5) The signer and aggregator communicate over their own channel (port). 6) The aggregator is connected and controls the signers on the stake majority of the block producers.

So, I must put a lot of trust into all these additional software pieces. What happens if the aggregator gets compromised or gets seized by the authorities? Then the aggregator software can be manipulated and potentially corrupt the signers and block producers of the stake majority or corrupt the databases of relay servers or even leak keys and so forth.

You might still think this is all moot. Here is a story: In the early days of Bitcoin there was an IRC client build-in for bootstrapping the network. It took only 4 lines of obfuscated source code to subvert the IRC client to execute arbitrary system commands on the node server. Security analysis showed that this was the downfall of one of the largest crypto exchanges at the time. All wallet keys were leaked. Thereafter all the IRC code was removed from the Bitcoin client.

Now we have the same pattern with Mithril. An unchecked http channel deep into the core SPO infrastructure that sends data back and forth waiting to be exploited.

This is why I am concerned. Also, in the future there might be more than one aggregator with more delegated trust. The risks are just increasing.

abailly-iohk commented 6 months ago

@onyxstakepool Thanks a lot for voicing clearly your concerns. It's a bit late for most of us right now, so I won't be able to respond in details but let's keep the conversation going and make sure we address those in the clearest possible way.

ch1bo commented 6 months ago

Should we move this discussion maybe to a github security advisory? The same people could be involved, but it would follow our Security guideline and we can still publish the outcome (a fix, change in practice, improved documentation, ...) as a definitive advisory.

abailly-iohk commented 6 months ago

That makes sense @ch1bo but it seems to me we are here discussing general principles and architecture rather than specific vulnerabilities, so perhaps doing it in the open would be better. Perhaps would it be even better to turn this into a Discussion?