[Enhancement] Simple and better security inspired by Wireguard.

ntop / n2n

Peer-to-peer VPN

GNU General Public License v3.0

6.04k stars 925 forks source link

[Enhancement] Simple and better security inspired by Wireguard. #671

Open KusakabeShi opened 3 years ago

KusakabeShi commented 3 years ago

WireGuard utilizes the following:[5]

Curve25519 for key exchange
ChaCha20 for encryption
Poly1305 for data authentication
SipHash for hashtable keys
BLAKE2s for hashing
UDP-based only.

I don't know too much about the detail of the implementation of wireguard, but I think we can use asymmetric key to communicate between each other instead preshared key among all nodes.

For example, supernode hold a peerinfo.yaml, like this:

supernode:
  PrivateKey: aHQS6mTdu0G2jzLi9RtNyzmCAoTfsmMLEOHSICHMwlk=
edges:
  client1:
    PublicKey: j/HtUE5qErOlD5yOOoGVj4O7lLKvQzHTaoHwDQETliI=
    MacAddr: 00:01:02:00:00:00
  client2:
    PublicKey: bfoyjgfmU8jEZyX4HgHabK/yH08vKIGVMwhhJRxHC3E=
    MacAddr: 00:01:02:00:00:01
  client3:
    PublicKey: xhOVRByH+n4q6D8/0sak4qWnK3EvzgLcbSnbKs5xNRk=
    MacAddr: 00:01:02:00:00:02

then, in the three edge node, they will hold a private key themself, client1. uMOw2iLYQ/eFNccsoc7CNEH71QBBCk3xZgpEOe0Zi3M= client2. MH68XsFVzzIAEKlJRU2IdyJGl6klPR01EQffj4Xm2E4= client3. GBZ5+cpE8KOnSg2se9A0+c4pRQL2x6oLv44DhOREK2E= and initialize with supernode public key lNPxqERkIF9Gb4R2xGq+czb0XCx7xcdZd7KR9nSZmmM=:

While edges wants to communicate with another edge, they just simply download the public key(the download process may encryped by supernode key pair) from supernode, and encrypt data with their public key, just like wireguard.

Maybe wireguard are not only just encrypt data with public key, it has key exchange or other steps, we can copy all these steps to our n2n, I think it is enough for security requirements for almost all scenarios.

Logan007 commented 3 years ago

@KusakabeSi, thank you very much for the inspiration! Actually, this partly is to what current ideas boil down to if implemented.

I think that this is a very related discussion because a "password" would need to be hashed and internally serve as a private key. So, along with the "user / password" authentication scheme, an internal "public / private key" scheme comes for free – at least for parts, it could be built upon to fully go public key later.

It is a wonderful idea of going full public key, I really like it, and we should start pondering how to implement it. I just think that we do not meet all prerequisites yet to implement it now (for upcoming 3.x series) as it would require some more thoughts on and probably changes to packet format, protocol, key handling, certificates, key revocation, signed-list-of-federated-supernodes handling at edge-side as well as server-side (in addition to the edge list you have mentioned), and code of course.

Also, we must be aware that this makes n2n gain some extra weight. An edge would not be able to just send a data packet out (first data package can serve two purposes: data transfer and peer-to-peer opener / NAT-hole puncher at the same time), an edge would need to fetch the correct key first and do the proper key calculations before being able to send out any data. Not only does it cause some communication overhead but also will it slow down initial communication by computationally expensive curve25519() function which I have already experimented with for #670. Not to waste the calculation results, the common secret data should be stored a bit longer in memory which would lead to slightly higher memory requirements (now, peer edges are purged if not seen for a while).

So, from my current perspective, this definitely is worth an effort! However, I do not think that it can come to life before the 4.x series.

Everybody, please share your thoughts.

lucktu commented 3 years ago

It is recommended to test the wireguard (I haven't used it either), take a cue from it, but don't make it wireguard. n2n should still have its own features.

KusakabeShi commented 3 years ago

Wireguard is layer3, n2n is layer 2 vpn, and n2n can do udp hole punching. so n2n will never become another wireguard.

And wireguard is a lightweight and secure VPN used widely in the world now. I think copy the encryption and handshake process from it isn't a bad idea.

Maybe we can split data encryption and data transfer into different layers.

like this: (pseudo code in python3)

edges_pubkey = {}
def get_edges_pubkey(macaddr): # Just a cache, return the macaddr of the edge. If not cached in local, fetch it from server
    if macaddr in edges_pubkey:
        return edges_pubkey[macaddr]
    edges_pubkey[macaddr] = get_edge_pubkey_from_supermode(macaddr)
    return edges_pubkey[macaddr]
########################################### A layer handles udp hole punching and raw data transfer
punched_edges = {}
def send_data_raw(buf,macaddr):
    if macaddr == "FF:FF:FF:FF:FF:FF":
        raise ValueError("Can't boardcast raw data.")
    dest = {"ip":supernode.ip , "port": supernode.port }
    if macaddr in punched_edges:
        dest = punched_edges[macaddr]
    else:
        try:
            punched_edges[macaddr] = try_to_udp_punch(macaddr)
        except PunchError as e:
            print("udp punch failed")
    socket.senddata(buf,dest )
def recv_data_raw(buf):
    macaddr = buf[:6]  # first 6 bytes
    buf_encrypted = buf[6:] # remains
    recv_data(buf_encrypted , macaddr)

########################################### A layer chooses data encryption and decryption methods
########################################### Wireguard only here, but we may add more encryption options. Like plaintext, or psk
wireguard_state = {} # like exchanged psks, Nonce, cookies and other states used by wireguard encryption process will stored here
def send_data(buf,macaddr):
    if macaddr == "FF:FF:FF:FF:FF:FF":
        edges_list = get_all_edges_from_supermode()
        for edge in edges_list:
             send_data(buf, edge.macaddr)
    buf_encrypted = wireguard_encrypt(buf,macaddr, get_edges_pubkey(macaddr), wireguard_state )
    send_data_raw(buf_encrypted , macaddr)
def recv_data(buf,macaddr):
    buf_decrtpted = wireguard_decrypt(buf,macaddr, get_edges_pubkey(macaddr), wireguard_state )
    tap_device_fd.write(buf_decrtpted,macaddr)
########################################### A layer which handle real encryption
def wireguard_encrypt(buf, macaddr, pubkey, wireguard_state ):
    # all steps wireguard encryption process do, like handshake , Cookie Reply Packet to protect against DoS
    if macaddr not in wireguard_state:
        if check_buf_content_is_handshake_request():
            send_data_raw( prepare_handshake_reaponse(),  macaddr)
        send_data_raw( prepare_handshake_request(),  macaddr)
        wireguard_state[macaddr] = updated_wireguard_state()
    # other steps
    return buf_encrypted 

def wireguard_decrypt(buf, macaddr, pubkey, wireguard_state ):
    # all steps wireguard encryption process do, like handshake , Cookie Reply Packet to protect against DoS
    return buf_decrypted

Not sure clear enough in my pseudo code.

Logan007 commented 3 years ago

@KusakabeSi, I can read you and your ideas in the code very well (even though not too deep into python :wink:). I fully agree that with a view to 4.0, it might be necessary to re-think the layers. That would include connections (TCP, UDP, multiple streams?), peer handling, community handling, authentication, routing, encryption (maybe on several levels as for peer validity, header encryption, content encryption, ...), local TAP "outlet" if present (in a true p2p environment I would think of current supernodes as peers without "outlet").

Handling broadcasts is a special issue to put extra thoughts on. I think we will need a commonly shared / negotiated / calculated broadcast key present at all nodes with outlet. Imagine networks of 5,000 edges: Shall every edge keep a listing of the other 4,999 peers (currently, local lists get purged from time to time to make it lightweight)? Requesting a list of 4,999 keys from the supernode (which we might not have anymore in a true p2p scenario) just because the peer wants to send out an ARP packet might strain the lines.

I am aware of Wireguard. I would not want to copy-cat their concept. The one thing that I think is worth considering is public key cryptography in terms of curve25519 which we see a lot these days – it is not unique to Wireguard. There is no need to copy other things such as the rain check handshake (because n2n tries again anyway if rejected – a peer could just reject if too busy). And, let me add that just because it is advertised as lightweight and secure, it does not mean that n2n is not :wink:

As mentioned in #670, there is more to public key cryptography than just adding a public and private key; I would want to include signed keys to give nodes the ability to verify the key against some (one) federation key. Also, I want to keep it as user-friendly as possible because some might find it annoying to handle cryptic certificate strings (plural) instead of one human readable password.

I am glad that discussion for 4.0 started that early. That will give us plenty of time and opportunities to refine ideas to concepts. Please do not stop!

KusakabeShi commented 3 years ago

Oh, I have never put 5000 devices in a single layer 2 network! Even in real world, I think there no layer 2 switch can handle it. Because put too many device in same layer 2 network will cause a huge problem, like broadcast storm or too many collisions

As far as I know, n2n is a pure layer 2 vpn. or it isn't? If it is a pure layer 2 vpn, and it can handle 5000 device now, it will shock me!

How current version of n2n handle broadcast in 5000 devices?

And in 4.0 version of n2n, will it still be a pure layer 2 vpn?

Logan007 commented 3 years ago

How current version of n2 handle broadcasts?

Broadcasts are forwarded and then distributed by the supernode; to all peers that are registered with it and to all other federated supernodes which will distribute it to all other edges connected to them.

So, the federated supernodes help to share the burden allowing to handle more edges then.

As broadcasts are encrypted with the payload key, all edges are able to instantly decrypt it. Supernodes cannot as they are lacking the key. They just act as "routers" in this case.

If it is a pure layer 2 vpn, and it can handle 5000 device now, it will shock me!

I have never tried this size myself but I have heard of 5,000 reported in some issue's discussion (#412). I would not be surprised if that was not exaggerated because edges only keep information of peers they really exchange data with (there usually is no full n x n communication required). Broadcasts along with bandwidth might be the limiting factor.

At the time when #412 came up, there had not been a federation feature yet.

And in 4.0 version of n2n, will it still be a pure layer 2 vpn?

4.0 is not really drafted yet, just collecting ideas. For now, I hope that the 3.x series will be somewhat long-lived because 4.x will bring a lot of changes again (following the ideas and discussions) and I would not want to see users shocked by too many changes happening at once or in a short period of time.

But yes, I hope it still will be layer 2. For now, I do not see any need to change that part of n2n. Do you?

KusakabeShi commented 3 years ago

Emmmm, the main problem is broadcast, right?

Maybe we can choose some edges to relay these packages? It's a broadcast package, any edges in same channel shell read the content. So it can decrypt, read content, and encrypt again with it's own key to another peered edges to avoid new handshake.

We can check if it is a broadcast packet we will relay it to peered edges, and figure out other methods to

minimize new handshake
minimize and filter out duplicate receive
make sure all edges at least receive one
avoid loop

If solutions above still too heavy, I have another suggestion: Just provide a option: disable broadcast.

Which will disable broadcast. All edges will register their ip and macaddr to supernode. While we need to connect to a ip address, arp handshake will handled by supernode. So we can no longer broadcast packets to all edges. We can only connect each one by one. Some portocol may not work like DHCP, but I think tcp and udp, and most portocols are still work.

It's a option provide to some peoples who have 50000 edges...

Logan007 commented 3 years ago

Maybe we can choose some edges to relay these packages?

Yes. That is exactly what supernodes do today and in the future (4.0) other peers will take over. Without "special" supernodes, every node will need to route packets. It is just a question of dynamically organizing it. So, we will need some basic routing protocol to figure out the best routes and low overhead. Routing can be a bitch, as outlined earlier. Today, there is no such need because of the clear rule the (federated) supernodes have.

The Kademlia approach discussed at some other issue offers wonderful node lookup mechanisms while keeping local lists limited to some extent. But the plain Kademlia routing that comes with it is not really suitable to detect best paths and optimized trees, e.g. to minimize broadcast. On the other hand, a fully fledged routing protocol might be too heavy. This also needs thoughts.

Actually, going full p2p would mean to replace supernode functionalities by regular peers, such as ping-partner, hole-punching-helper, help with node lookup, routing (broadcast and single-cast in case of no p2p possible) , ... and all that in a protocol as simple as possible :exploding_head:

any edges in same channel shell read the content.

That would imply that all edges know the key. Very easy in case of symmetric cryptography, not so easy in case of asymmetric ciphers (because we would need to keep all peers along with their keys in all local lists, something we want to avoid for scalability, the basic idea of the approach suggested in #670 might be of use to distribute one commonly used broadcast-key).

Just provide a option: disable broadcast.

Arp packet will replied by supernode while needed.

I would leave ARP as it is given its essential function in the ethernet network. It is best handled by the edges' OS. However, it could be filtered, e.g. the only broadcast packets allowed. Such an option would be nice if it can't be created using a special filter rule (-r).

I hope to get some feedback how federated supernodes help that 5,000 edge scenario. It should help, your drawing in #412 basically already shows the federated supernodes in the middle, no need for routing protocol on top.

KusakabeShi commented 3 years ago

any edges in same channel shell read the content.

Assume I am a relay edge, I mean I can decrypt and re-encrypt the packet with my own key to avoid new handshakes

It is best handled by the edges' OS.

Emmmm, like this?

So .1.3 and .1.4 will not handshake in this boardcast

Logan007 commented 3 years ago

If the supernode is aware, it certainly is possible – and actually a good idea! – to answer the "who is" or forward it to the supernode that has the information (in federation, every supernode only holds MAC and associated supernode for nodes connected to another supernode). I just wonder if the other nodes do not learn anything from ignored ARP "who is" packets for their own ARP table?

Actually, such a feature would be helpful already for current n2n as well. Do you want to implement it?

But hey, wait: Supernodes can't read PAKET content, the ARP would need to be translated into a QUERY_PEER packet type (we already have that), and the answer PEER_INFO would need to be locally translated back into ARP again. That could be working and would avoid a lot of broadcasts!

The ARP "announce" packets would still need to broadcasted as far as I can see.

KusakabeShi commented 3 years ago

I just wonder if the other nodes do not learn anything from ignored ARP "who is" packets for their own ARP table?

I think yes, but this will not establish new peers because it's in different layers.

Learn ARP table is in OS layer, but establish peers is in an n2n internal layer.

It will establish when only it seen some non-broadcast packets.

Like a HTTP request, or a arp-reply, will establish new peers, doing curve25519 stuff But ARP-probe/Arp-announce is a broadcast packet, it will not establish new peers.

KusakabeShi commented 3 years ago

Is there any detailed documentation or discussion thread about federation? I am still not quite clear about it.

But in my opinion, supernode do not join the network may make this network more simple.

I am wondering is it enough that if supernode can only see This is a boardcast packet and This isn't boardcast, dst macaddr is xxx, no QUERY_PEER, PEER_INFO stuff need to be translated.

If one supernode only, it is enough. Not sure if it enough in multiple supernode/federation

Logan007 commented 3 years ago

I am wondering is it enough that supernode can only see This is a boardcast packet and This isn't a boardcast, dst addr is xxx

The supernode can see the MACs (and therefore classify broadcast or not) but not the packet content (is it an ARP packet?). That's why we need to translate at the edge before sending out. The already existing QUERY_PEER and PEER_INFO can do the job, see below.

I think yes, but this will not establish new peers because it's in different layers.

You are right. New entries in the OS' ARP table might only prevent future ARP "who is" packets to be sent because they already were introduced to each other. But what is the chance that such an entry will be used in the ARP entry's lifetime. In bigger networks it probably is more unlikely, so probably of limited use. But your idea would save a lot of broadcast traffic which is more important for bigger networks.

So, why don't we extend the QUERY_PEER and PEER_INFO packets with the TAP adapter's IP address (if they do not even have it yet) and build an ARP--QUERY_PEER/PEER_INFO translator at the edge? The ARP does not reveal any information to the supernode that it does not know yet anyway (the TAP IP and MAC addresses are known to it).

One the other hand, a malicious supernode could poison the edge's ARP table then.

doing curve25519 stuff

For 3.0, that would not affect peer-to-peer handshake as the user-password authentication is between edge and supernode to have new edges access the commonly used symmetric key. But if we really go p2p, that would be a great relief for edges that do no really communicate with each other (if we really need curve25519 for the broadcast handling of which I am not sure yet).

Is there any detailed documentation or discussion thread about federation? I am still not quite clear about it.

Several issues, most of them authored by @fcarli3, also the corresponding pull requests, and the Federation.md document in the /doc folder. Basically, it is a special community only consisting of supernodes. Edges chose "their" supernode based on a load-based selection strategy. Packets that require forwarding (broadcasts or in case of no p2p possible) are forwarded from the source-supernode to the destination supernode and then to the final edge.

Do you have specific questions?

Not sure if it enough in multiple supernode/federation

I think I can remember that QUERY_PEER packets are forwarded to the right supernode to get answered from there (PEER_INFO) via the original supernode (because we can't guarantee that other edge's supernode can talk to the original edge).