Header Format - Githubissues

Having today's and maybe future featureset (encryption scheme, IV, compression indicator, DH key exchange, FEC, MAC, ...) of n2n in mind, I want to bring up a discussion about standardization of the header preceeding the actual data: its format as well as its encryption.

The header should be the same format for all encryption schemes (as of now: AES and TWOFISH) and allow for other modular extensions to be used, e.g. own encryption or compression schemes.

Any ideas or suggestions?

I agree with this proposal.

As found in doc/HACKING, current header format seems to be as follows:

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   ! Version=2     ! TTL           ! Flags                         !
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 4 ! Community                                                     :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 8 ! ... Community ...                                             :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
12 ! ... Community ...                                             :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
16 ! ... Community ...                                             !
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
20 ! Source MAC Address                                            :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
24 :                               ! Destination MAC Address       :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
28 :                                                               !
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
32 ! Socket Flags (v=IPv4)         ! Destination UDP Port          !
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
36 ! Destination IPv4 Address                                      !
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
40 ! Transform ID                  !
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
44 ! Payload
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Socket flags provides support for IPv6. In this case the PACKET message ends as follows:

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
32 ! Socket Flags (v=IPv6)         ! Destination UDP Port          !
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
36 ! Destination IPv6 Address                                      :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
40 :                                                               :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
44 :                                                               :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
48 :                                                               !
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
52 ! Transform ID                  !
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
56 ! Encapsulated ethernet payload
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

This needs to be verified against wire.c.

Then, we need to find suitable space for feature flags.

Afterwards, DPI-proof encryption needs to be discussed. Couldn't we use some light block cipher in CBC mode with some random (!) IV or nonce? As packets must remain decryptable by the supernode, why don't we use the community name – it is known to the supernode anyway – as key? In doing so, there also is no need to transmit the whole community name but just some magic number which eventually would indicate successful decryption – my favorite is n2n in ASCII: 0x6E 0x32 0x6E. To use these three bytes instead of 16 bytes for the whole community name frees up 13 bytes which could be used for feature flags as well as a nonce or (part of) an IV. The block cipher needs to be lightweight in terms of performance requirements at least for decryption as the supernode needs to try every community name as key on every packet.

Any thoughts?

~~What about SKINNY as lightweight block cipher?~~

~~That choice were supported by evaluating this wiki.~~

~~We could use the 64-bit block size and 64-bit (twea)key. If performance is of concern yet it already is extremely lightweight, we could even reduce the number of rounds.~~ [too slow as SKINNY seems to be optimized for hardware implementation; looking into other lightweight alternatives]

Here, I wouldn't be concerned about security too much as the header's encryption is just for obfuscation – hiding the packet's n2n character from DPI's eyes. Note, that up to now, the header is not encrypted at all.

The header encryption does not have a negative impact on payload's confidentiality. Au contraire, it adds some as the header does not reveal internal MACs (which usually also are part of the payload) too easily anymore.

If the community name is used as key, we do not need to put too much effort in that encryption anyway. It just keeps away eavesdroppers in the middle of the line but not those attackers who can take control over the supernode. The latter ones will be able to decrypt the headers but do not get more information than now.

Strolling through wire.c, n2n_wire.h, and edge_utils.c showed four things:

~~1. Flags field offers some unused bitly space...

... maybe to indicate compression (#91), (short therm)~~ [done, see #237]
the suggested changes require some revamp of the whole packet assembly, there will not be the one-stop-shop where en-/decryption can be inserted and performed ...
... and thus take some time (long term)

Open for discussion and hints. Are there any ideas of upcoming changes to the packet assembly anyway?

I want to share the following thoughts on header encryption and would be glad to receive any feedback:

To allow for header encryption more easily, the always present common section of the header needs a slight adjustment – which, for the sake of compatibility, could be restricted to packets of communities with encrypted header. It would just put the former first-of-line fields Version, TTL, and Flags behind the Community name and thus make the header look as follows:

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
! Community ...                                                 :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4 ! ... Community ...                                             :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8 ! ... Community ...                                             :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
12 ! ... Community                                                 !
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
16 ! Version=2     ! TTL           ! Flags                         !
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
20 ! Source MAC Address ...                                        :
 ...

In encrypted headers, the 128-bit Community field shall be used for transmission of an initialization vector (IV), a magic number, as well as the header's length. The latter is important to be able stop decryption just before the beginning of a possibly following payload. That would change the actually header fields usage as shown below:

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
! IV ...                                                        :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4 ! ... IV ...                                                    :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8 ! ... IV                                                        :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
12 ! 24-bit Magic Number, e.g. "n2n" = 0x6E326E    ! Header Length !
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
16 ! Version=2     ! TTL           ! Flags                         !
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
20 ! Source MAC Address ...                                        :
 ...

The 96-bit IV is unencryptedly transmitted as is, the encryption algorithm starts actual encryption at byte number 12, header's length. The community name is used as key, it is only known to the edges and the supernode.
Decryption would setup IV and try every listed community name as key on the following 32 bits of a packet to hopefully find header's length and especially the magic number. Eventually, the decryption process could continue.
Before returning from decryption routine, the original order of fields could be restored including the step to copy the community name which successfully decrypted the header back to the header – not to interfere with the rest of the edge's and supernode's code.
Algorithm for de/encryption should be used as stream cipher to allow any header sizes and not being stuck to fixed block sizes. A block cipher in CTR-mode starting from IV's value would do; slower CFB-mode could be considered, too.
Even though header encryption "only" adds obfuscation to protect against eavesdroppers, it should be strong enough to withstand analyzing efforts. In case of stream ciphers, we do need to avoid repeating IVs. Otherwise, if packet collectors find two packets sharing the same openly transmitted IV, it just needs to XOR the packets getting the XORed two packets which might show a lot of zeros in wide and especially characteristic portions of the header and thus indicating an n2n packet. That's why division of the available 128-bit field in a 96-bit IV leaving a 24-bit magic number only is proposed.
Performance of header encryption is of concern. I am having a look at the already present Twofish as well as SPECK – NSA developped and even recommended to US government entities if (not always present in n2n) AES meets performance constraints.
Coding would include the addition of some header_encryption.c and ... .h files and for encryption hook in right before calls to sendto_sock or send_to respectively. If required, decryption would be called directly after receival.
The decision for or against encrypted headers shall be made per community (no mixed communities).
A supernode shall be able to seamlessly handle communities with encrypted headers next to those without encrypted headers. To allow for comfortable auto-detection, some additional logic is required to distinguish packets with encrypted headers from those without.

What do you think?

With a view to #237 and #246, the need of a dedicated compression indicating field in the PACKET specific area becomes evident. For now, free-riding on the transform field might be acceptable until a new major release (3.0?) might break compatibility to packet format anyway.

As next step, I would like to proceed implementing the header encryption. I still am wondering if to use 64-bit block size (faster lookup-check as in above mentioned step 4 – especially on 32 bit CPUs) or 128-bit block size (faster decryption of the whole header – on 64 bit CPUs)... A by-product could be a new optional but built-in (independent from external libraries and thus always available) lightweight cipher for payload encryption.

Also, I want to take the opportunity to implement additional features such as replay protection and a checksum by free-riding (again!). This time, on the IV for header encryption, following the idea roughly sketched below:

  24 bit checksum (over whole packet including payload)
+ 24 bit timestamp
+ 48 bit pseudo-random number or counter reset with every new timestamp
_______________________________________________________________________
--> 96 bit pre-IV

 encrypted (format preserving, key is community name-derived but
 different from header encryption key) to

--> 96 bit IV (encryption will have it look pseudo-random)

A corresponding decryption step will reveal the checksum and the timestamp to the receiver for further use. If those features are not enabled, the additional en/decryption step could be omitted. This way, we do not need to add further dedicated fields. The downside is (if even considered a downside), checksum and replay protection will not be available without header encryption. In my opinion, as checksum and replay protection already are higher-level security, header encryption could be considered a kind of natural prerequisite. ~~That is why I would stick with this pattern even if major packet structure updates allowed an opportunity for changes.~~ EDIT: Scratch that – for environments with forcibly unencrypted transmission (for use on HAM?) checksum and timestamp would be desirable features, too. So, after intermediately free-riding as shown above, the long term perspective should comprise dedicated header fields.

If I am able to stick to the 11-step scheme from above, this all will be even compatible to current dev (and will easily remain compatible with any later changes to the packet format). So, if I do not encounter too many difficulties and unexpected coding issues, this could become part of a minor intermediate release – if planned (2.8?).

Starting to code! Will take some time. Any hints?

The previous one was very good. The focus should be on the communication speed of 10,000 edge connections.

@skyformat99 Just for clarification: What exactly do you mean by "the previous"?

Concerning the speed,encryption will cost some speed – that is why I was looking for an extremely fast cipher. In addition, I plan to leave it an optional user decision if to turn on or off header encryption. So, it is not mandatory to use it.

I am interested: Do you already successfully employ a network of 10,000 edges? If yes, would a multiple-supernode approach be helpful to you?

ntop / n2n

Header Format #198