logos-co / nomos-node

Nomos blockchain node
54 stars 19 forks source link

Mixnet PoC #273

Closed youngjoon-lee closed 1 year ago

youngjoon-lee commented 1 year ago

NOTE: This topic is being developed in the mixnet branch.

What needs to be done

Expecting that the mixnet (potentially Nym mixnet) achieves most of network privacy requirements that we want,

Expected outputs

Resources

youngjoon-lee commented 1 year ago

Before plugging mixnet into our simulation or our node implementation, we need to decide the strategy how to use mixnet in gossipsub. During our off-site, we had considered using mixnet only for the first hop of gossipsub. A good news is that Lighthouse (Ethereum Rust implementation) also uses the same strategy, if this slide is true: https://github.com/ChainSafe/lighthouse/blob/nym/libp2p-nym-integration-demo.pdf

After this first shielded hop, all other hops can be conducted through regular TCP transport to avoid addiotional latency. The block producer's identity is already anonymized.

youngjoon-lee commented 1 year ago

The simplest approach would be:

Basically, we can implement a libp2p-mixnet transport, as ChainSafe did: https://github.com/ChainSafe/rust-libp2p-nym.

But, for our sim app, it seems that we need to implement another NetworkInterface which mixes packets. I'm not sure if it would be simple or not. If it's not simple, it would be easier to just implement the libp2p-mixnet transport, integrate it into the nomos-node, and compare the difference with regular libp2p-based performance.

bacv commented 1 year ago

In case of using simulation, you are right about the network interface. This could be achieved by introducing new MixnetInterface, which uses NetworkMessage with a payload of MixnetPacket.

MixnetPacket could hold the information about the 3 random destinations and the actual CarnotMessage. If the interface receives the MixnetPacket and the max forward count is not exceeded, then it increases it and sends the message back to the network.

danielSanchezQ commented 1 year ago

In case of using simulation, you are right about the network interface. This could be achieved by introducing new MixnetInterface, which uses NetworkMessage with a payload of MixnetPacket.

MixnetPacket could hold the information about the 3 random destinations and the actual CarnotMessage. If the interface receives the MixnetPacket and the max forward count is not exceeded, then it increases it and sends the message back to the network.

Are we sure we want to introduce this simulation wise? If I think straight about this, in reality adding a mixnet will just add an average delay of Xms to delivery. So I am not sure we actually really want to simulate this. Simplest solution would be to add a general mixnet_delay attribute in settings and add that to general message dispatching delay.

TLDR: Why should we actually implement mixnet delivery in the simulation app?

youngjoon-lee commented 1 year ago

TLDR: Why should we actually implement mixnet delivery in the simulation app?

I agree. Last Monday, I thought that we need a certain tool to measure the latency/bandwidth amplifications in various environments because we don't have a specific strategy how to adopt mixnet to our architecture yet. I thought the simulation could be a great tool for this purpose. However, after studying about simulation last week, I now guess that it may also take some time to extend our simulation for this purpose, because our simulation is currently for analyzing Carnot consensus, not for measuring performance according to the internal tech stack or architecture.

In order to analyze the behaviour of Carnot under mixnet, it would be enough to add a mixnet_delay attribute, as you suggested.

Instead, in the perspective of evaluating mixnet, now I think it would be quite straightforward to quickly implement a mixnet transport (PoC) for libp2p. With it, we may be able to simply run multiple nodes (even on a local machine) and measure performance comparing with the regular libp2p.

danielSanchezQ commented 1 year ago

Instead, in the perspective of evaluating mixnet, now I think it would be quite straightforward to quickly implement a mixnet transport (PoC) for libp2p. With it, we may be able to simply run multiple nodes (even on a local machine) and measure performance comparing with the regular libp2p.

Indeed, also the mixnet protocol would probably have to be implemented as a transport itself anyway. So to me it sounds about right.

danielSanchezQ commented 1 year ago

I remember seeing that nym mixnet uses a nym identifier for nodes. If it is not a security/privacy breach.

Open questions:

youngjoon-lee commented 1 year ago

I remember seeing that nym mixnet uses a nym identifier for nodes. If it is not a security/privacy breach.

Open questions:

  • Does linking the node_id (staking public key) to the nym/network identifier breaches privacy?
  • Is this even possible to do?

This is a super super naive architecture regarding your suggestions, even though this still has a lot of black boxes to be well designed.

image
  • Could we leverage those identifiers to send p2p messages between nodes without direct connections to each others?
  • I have the feeling that this would save a lot of bandwidth for voting porpoises and make a big step. Does it?

Yes. I guess we can do it by sending a message through mixnet to a specific recipient, as described in the 1st diagram. The key point that we should check is whether the mixer can resolve IP addreess from virtual_id (e.g. nym_id or network_id). I guess we can get a hint from Nym because they actually does it.

Does linking the node_id (staking public key) to the nym/network identifier breaches privacy?

If an adversary operates a node in the mixnet and the node can resolve the IP address from the node_id (associated with network_id) and he can see the staking amount of the node_id, he may be able to try (D)DoS. Hmm. If he cannot see the staking amount, I guess he cannot choose which node he is gonna attack

youngjoon-lee commented 1 year ago

I'm just sharing the Nym architecture that I've studied so far from their source code. Please note that some informations may be incorrect. mixnet drawio

This diagrams can answer some of my questions:

Q1: How do senders get information about mixnodes to construct Sphinx packets?

Q2: Why does Nym introduce the layered mixnet topology?

Q3: Isn't exposing the topology on the Nyx blockchain a security flaw?

Q4: Any mechasim to resolve a IP address from a Nym address?

Q5: How do mixnodes communicate with each other?

From this, we have some topics that we need to thinking about, when desiging our architecture:

T1. Constructing mix routes (for now, not considering the layered topology, for simplicity).

T2. Associating nomos::NodeId with libp2p::PeerId or others (if we don't use gossipsub for some cases, as @danielSanchezQ suggested)

T1 has a higher priority, compared to T2, in my opinion.

al8n commented 1 year ago

Amazing diagram and explanation! @youngjoon-lee 👍

youngjoon-lee commented 1 year ago

T1. Constructing mix routes (for now, not considering the layered topology, for simplicity). T2. Associating nomos::NodeId with libp2p::PeerId or others (if we don't use gossipsub for some cases)

Although these topics aren't resolved yet, I just published a draft design of mixnet integration, which uses mixnet only for the first hop of gossipsub: https://github.com/logos-co/nomos-node/pull/288.

youngjoon-lee commented 1 year ago

Although I didn't read the full source code of paritytech/mixnet and https://github.com/paritytech/substrate/pull/14207 yet, they store the topology (IP addr and pubkey) of mixnodes on the blockchain, so that mixnet clients can construct packet routes using that information, as Nym mixnet does. The difference with Nym is that they don't have the "layered" topology. They choose N mixnodes randomly and expect all mixnodes to be fully connected with each other, if I understand it correctly.

danielSanchezQ commented 1 year ago

Although I didn't read the full source code of paritytech/mixnet and paritytech/substrate#14207 yet, they store the topology (IP addr and pubkey) of mixnodes on the blockchain, so that mixnet clients can construct packet routes, as Nym mixnet does. The difference with Nym is that they don't have the "layered" topology. They choose N mixnodes randomly, if I understand it correctly.

So yeah, having just the mixnodes addresses onchain should be no problem. Issues will be coming when mixing them out with the nomos nodes. We really need to think about this.

youngjoon-lee commented 1 year ago

So yeah, having just the mixnodes addresses onchain should be no problem. Issues will be coming when mixing them out with the nomos nodes. We really need to think about this.

Exactly. First of all, the issue is that we don't have a shared storage (like blockchain) to store the topology, even if we don't think about privacy.

Instead, I'm thinking about using DHT for advertising mixnode informations: https://github.com/logos-co/nomos-node/pull/288#discussion_r1282925537.

zeegomo commented 1 year ago

First of all, the issue is that we don't have a shared storage (like blockchain) to store the topology

Why not? If necessary can't we require that information to be available like we assume the stake distribution is?

Then, it will probably lower the privacy of Nomos nodes to share the IP address of all nodes with each other

Nodes have to communicate their address to be able to be contacted, I don't think there's any way around this. If the address is shared with part of the network or the whole network I don't think makes any difference (once the information is out you can probably do little to restrain who has access to it). What we might be interested in doing is avoid node id - ip address linkability

youngjoon-lee commented 1 year ago

Why not? If necessary can't we require that information to be available like we assume the stake distribution is?

True, but I think it depends on which layer we're going to implement the mixnet in, because the state (synced by consensus) will be in upper layers of the networking. Nevertheless, we can also have the shared topology to be injected from the upper layer to the networking layer (mixnet). As you said, if we really need the shared topology, I believe we can design any way possible.

Nodes have to communicate their address to be able to be contacted, I don't think there's any way around this. If the address is shared with part of the network or the whole network I don't think makes any difference (once the information is out you can probably do little to restrain who has access to it). What we might be interested in doing is avoid node id - ip address linkability

I now have the same thought. As long as we avoid nomos::NodeId <> IPAddr linkability, it's probably okay to have the topology shared with all nodes. And, I think we can avoid linkability because we need only IP addresses and public keys for mixnet.

alvatar commented 1 year ago

Yes, I think that's the key. NodeId <> IP unlinkability. The IPs in the system are known to everyone. The topology can be stored on-chain, or could be computed deterministically based on a {random seed, list of validators}

What are your thoughts on NodeID-IP unlinkability? What would the Mixnet use to define the next hop, Node ID or IPs?

youngjoon-lee commented 1 year ago

Yes, I think that's the key. NodeId <> IP unlinkability. The IPs in the system are known to everyone. The topology can be stored on-chain, or could be computed deterministically based on a {random seed, list of validators}

What are your thoughts on NodeID-IP unlinkability? What would the Mixnet use to define the next hop, Node ID or IPs?

@alvatar Thank you. I'm pretty sure that we can unlink NodeId with IP that is needed for mixnet. In perspective of mixnet, NodeId is not necessary. What we need for mixnet is IP and public keys (so that we can encrypt Sphinx packets for next hops). Here, we should use public keys that aren't associated with NodeId (or staking accounts).

danielSanchezQ commented 1 year ago

What would the Mixnet use to define the next hop, Node ID or IPs?

This is the main question. If we can leverage to make the network resolve the Ip without the protocol knowing the exact address but the networkId (related to the nodeId) in a decoupled way. Then we could use the mixnet to send packages directly to nodes instead of broadcasting (this is something to consider for the future but it is not the main concern atm). But it is difficult and we have no answer atm. Correct me if I'm wrong @youngjoon-lee !

Side note: I'm refering to have a distributed network layout where nodes do not really know how to reach other node directly but they can route the packages through the mixnet with a clear destination using network identifiers instead of Ip addresses. We had some conversations about this. But again, do not even know if its possible, neither is the focus now. Just leaving this here for future review.

youngjoon-lee commented 1 year ago

Then we could use the mixnet to send packages directly to nodes instead of broadcasting (this is something to consider for the future but it is not the main concern atm). But it is difficult and we have no answer atm.

Yes. So far, I haven't found any solution for this. Even without thinking about the mixnet, I think it's not easy.

Let's say we derive nomos::NodeId and libp2p::PeerId (a network ID that you mentioned) from the same public key. And, let's assume that nomos::NodeId is associated with its stake, and assume that we have a way to convert nomos::NodeId into libp2p::PeerId somehow.

The scenario that we're talking about is sending a VoteMsg to a specific node with NodeId-1, for example. With the assumption above, we can derive a PeerId-1 from the NodeId-1. Using PeerId-1, we can ask libp2p::Kademlia to find Multiaddr-1 of the PeerId-1, and can establish a direct connection with Multiaddr-1 if the peer discovery was successful.

But then, the node operator can aware that Multiaddr-1 is associated with NodeId-1 that might have a huge amount of stakes.

Even if we adopt mixnet, I think this doesn't change, as long as we associate NodeId with PeerId.

But yeah, this is not the main topic for now, as you said.

youngjoon-lee commented 1 year ago

@danielSanchezQ @Zeegomo @al8n @bacv I've made three PRs for the Mixnet PoC. Although they aren't meant to be merged to the master, they contain almost everything about our potential approach of the mixnet integration. (Code quality is very bad now). I also added a README there, which described the current approach and some considerations that we need to think about.

I'm trying to measure the latency (block time) amplification of the mixnet integration. Also, thinking of having a call with you guys about these PRs if necessary.

danielSanchezQ commented 1 year ago

@danielSanchezQ @Zeegomo @al8n @bacv I've made three PRs for the Mixnet PoC. Although they aren't meant to be merged to the master, they contain almost everything about our potential approach of the mixnet integration. (Code quality is very bad now). I also added a README there, which described the current approach and some considerations that we need to think about.

I'm trying to measure the latency (block time) amplification of the mixnet integration. Also, thinking of having a call with you guys about these PRs if necessary.

I think we should rebranch from master and have a branch just for this feature were we can add any other mixnet related PR. We would have to rebase that branch from master to keep it updated but it is better than falling behind too much.

youngjoon-lee commented 1 year ago

I think we should rebranch from master and have a branch just for this feature were we can add any other mixnet related PR. We would have to rebase that branch from master to keep it updated but it is better than falling behind too much.

Yeah. I just created the mixnet branch based on the latest master, and rebased three PRs, so that those can be merged to the mixnet branch.

danielSanchezQ commented 1 year ago

I think we should rebranch from master and have a branch just for this feature were we can add any other mixnet related PR. We would have to rebase that branch from master to keep it updated but it is better than falling behind too much.

Yeah. I just created the mixnet branch based on the latest master, and rebased three PRs on it.

The idea of rebasing was for keeping it up to date with master. The PRs should be squash-merged on it instead of main 😅