mas-bandwidth / netcode

Secure client/server connections over UDP
BSD 3-Clause "New" or "Revised" License
2.43k stars 190 forks source link

Rust implementation #8

Closed vvanders closed 7 years ago

vvanders commented 7 years ago

I've started on a wrapper of netcode for Rust located at https://github.com/vvanders/netcode-rust (until we can figure out best way to merge).

Right now only client functions are hooked up and nothing is tested but it's a bit of a start. Currently it bootstraps netcode via "gcc" crate which shells out to msvc pretty cleanly on windows. Still need to sort out linux.

I'll update this issue once I've got something more stable, this is mostly a placeholder to discuss ongoing integration and related items.

gafferongames commented 7 years ago

Sounds fantastic. Thanks!

I'm restructuring the project right now so there is a rust directory you can work under, and a go directory, in case somebody else wants to do that.

I've moved all my work under "c"

gafferongames commented 7 years ago

Oh wait, it's a wrapper or an implementation in Rust? I'm confused. If it's a wrapper, I think it's best to keep it as a separate project.

But if somebody wants to create an actual implementation in rust, yes I would accept that in this repository.

cheers

vvanders commented 7 years ago

Yeah, it's a wrapper for now, sorry if I wasn't clear on it up front. Totally fine with keeping it in a separate repo.

My plan was to wrap the c impl and make sure to nail down a solid API + docs. Once that's done then migrate things piecewise over to a Rust impl. Probably server first followed by client. That way anyone can pick up Rust bindings right away since it's going to take me a while to get a proper implementation off the ground.

gafferongames commented 7 years ago

Sounds pretty good, let me know when you get to the implementation point. Meanwhile, I'll continue to work on the spec.

vvanders commented 7 years ago

So I've got everything wrapped now. Started to take a crack at unit tests but Rust's multithread runner + actually binding sockets doesn't play nice, which is pretty much what I'd expect. That said VSCode + C/C++ extension actually lets me debug Rust + C inline together which is nifty.

I think I'm going to work on the API bits next, I want to get something a bit more in line with Rust's patterns.

Also had a quick question on the server C API. There's a check for it a client is connected but I can't seem to find a way to distinguish them, from what I could tell there could be a case where we have a disconnect + new connection that could look like the same client, ideas?

I was also wondering what the netcode_client_index() was for? I can't seem to find any usages or reference to client_id in the code.

Anyway, that's probably where I'll leave it for tonight, overall I like the spec. Simple and straightforward, was pretty easy to pick up and connection tokens are a nice idea.

gafferongames commented 7 years ago

netcode_client_index returns the index of the client on the server. which is basically the slot number, so on a 4 player game, you could have slot index 0, and be player 1 effectively. This is used by games to assign who is which player.

On the server there is a check to see if a server is connected, and then possibly a short time later, yes a different client could be connected in that slot. Would you like the ability to get the client id for a client slot as a public API? This is a globally unique identifier for clients, so it would be different if a different client is in the slot.

cheers

gafferongames commented 7 years ago

added server_client_id function. returns uint64_t unique client id for that slot so you can distinguish different clients in a slot before vs. after. -- cheers

vvanders commented 7 years ago

That totally makes sense on the client_index, I was thinking of some cases where we didn't use fixed slots and so my head wasn't in the right space.

Thanks for the client_id, should make it trivial to generate connect/disconnect events in addition to packet events. If I can get some time tomorrow I think I've got a good idea on what the Rust side will look API-wise. After that it's time to dig into libsodium and get started on a proper implementation.

gafferongames commented 7 years ago

I'm excited. Thanks for your work!

vvanders commented 7 years ago

Yeah, me too. I've been looking for a non-trivial Rust project that's small enough that I can complete it in a month or two of coding on the side. Doesn't hurt that the domain is interesting too.

I'm still working on the API but I started digging into libsodium/netcode internals and was wondering why it looks like the server sends down two full keys once a channel is established rather than using asymmetric key pair with the client sending up their public key? More for my education than anything else since I haven't worked with sodium before. Both approaches seem like they'd work but I would think that it would be a tad more secure to not share the key with the client in case of interception(I know sequence numbers help here too)?

gafferongames commented 7 years ago

I send down two keys because each packet needs to be encrypted with a nonce (eg. 64bit sequence) number. So there is one key for the client -> server direction, and another key for the server -> client direction. I felt this was more elegant than for example, having the high bit set on the sequence number for one of those directions, because you can never use the same nonce value more than once, or you risk exposing the private key.

gafferongames commented 7 years ago

ps. The symmetric keys are sent down to the client, but these keys are only the keys for encryption between the client and the server over UDP. The other key, the private key that generates the connect token, is known only to the web backend and the dedicated server instances.

The reason this works is that the connect token is sent down to the client over HTTPS (public and private portion), then only the private portion of that connect token (encrypted with shared private key), is sent over UDP as part of connection handshake.

cheers

vvanders commented 7 years ago

Yeah, that makes complete sense. I was just thinking it'd be slightly more secure to have asym from client->server and server->client in case the client was ever compromised they couldn't use it to generate fake packets(with new nonce) spoofed from the server. Principals of least access and all that but it's a pretty theoretical attack anyway.

Either way libsodium is pretty nice, the Rust bindings are a bit behind and don't use ChaCha20 so I'll probably just have to use raw FFI which is fine, it'll be easier to closely mirror what netcode.c already does.

gafferongames commented 7 years ago

Yeah, that makes complete sense. I was just thinking it'd be slightly more secure to have asym from client->server and server->client in case the client was ever compromised they couldn't use it to generate fake packets(with new nonce) spoofed from the server. Principals of least access and all that but it's a pretty theoretical attack anyway.

Oh I see what you mean here. I hadn't considered this as a viable attack. I consider the client fully compromised (running on PC), so I would expect an attacker on client would be able to modify code or memory locally to insert arbitrary packets or data into its queue anyway.

cheers

vvanders commented 7 years ago

Yeah, that's totally fair. Like I said it's a pretty rare attack angle.

I have to say, I'm really impressed with libsodium. Seems very well put together and tons of awesome auth/crypto tools.

gafferongames commented 7 years ago

I'm a huge fan of libsodium. It's a great library. Somebody should port it to Rust :)

vvanders commented 7 years ago

Yeah, there's a set of wrappers but they're still a bit lacking(no ChaCha20 secret box for instance).

So I'm starting to dig into the implementation, a couple questions about connection tokens if you don't mind:

timeout_seconds - this looks fixed, any reason to encode rather than control server-side? create_timestamp - Can't we just use UTC here and only store expire time? It looks like it's ignored in netcode.c:2608 userdata - Is this for future use or just to have a unique set of data for the challenge token? sequence - Is this something that you'd expect users to have domain knowledge around and specify? I saw in libsodium that they say that random_bytes is sufficient for nonce since the collision chance is small? I'm wondering if this would be better to not let the client specify in case they accidentally reuse a number.

Also let me know if any of this feedback is unwarranted. I'm mostly just trying to get context on the parts of the framework but I'm happy to also provide notes on things I as I encounter them if you think that'd be useful.

vvanders commented 7 years ago

Also, one last thing and then I promise I'll shut-up and get back to implementing. I see a bunch of ifdefs around endianess but I only see it used in some of the test suites. Is this transport considered endian specific?

gafferongames commented 7 years ago

The transport has little endian byte order. On a big endian machine, you should convert to/from little endian. See the functions to netcode_read_uint32 and so on, they're written in an endian agnostic way (they could be optimized to use the endian #ifdefs...) but currently they are just coded slowly so that the uint16_t/uint32_t values written/read are in little endian order.

For the other questions:

timeout seconds is fixed in the C reference implementation for simplicity, can be a server-side variable set to whatever you want. This is why it's in the connect token, so it can vary. I recommend a default of 5 seconds.

create timestamp for the token is used on the client so it knows when to give up, eg. if client connect goes on longer than expire timestamp - create timestamp in the connect token, the connect cannot possibly succeed, because the connect token has expired, so give up. This is a safety measure in case the client gets a long list of servers that are all going to timeout (eg. worst case timeout per-server before client realizes it needs to go onto the next one).

userdata is user defined data, 256 bytes large, that allows communication from the web backend to the dedicated server instance per-connecting client.

64 bits is too small to just roll random and have secure crypto, hence sequence # that increases with each connect token generated. The user should be aware of this, as they have to save that sequence # and restore if the server goes down and back up, eg. not start back up again at sequence 0 or they risk exposing the private key.

vvanders commented 7 years ago

Thanks for the details, not sure how I missed the endian thing, makes sense now.

On the nonce I though the point of those was to defeat replay attacks rather than leaking the key? The docs specifically reference randombytes_buf as a reasonable source:

https://download.libsodium.org/doc/secret-key_cryptography/authenticated_encryption.html The nonce doesn't have to be confidential, but it should never ever be reused with the same key. The easiest way to generate a nonce is to use randombytes_buf().

That said I don't think it's going to matter much in implementation terms just trying to see if it's worth making some simplifying assumptions. I'll go ahead and keep it in on the Rust implementation. I think I've got enough to sort out the token generation and connection handshake. It'll probably take me a few days to get something talking to each other but I'll keep you updated on my progress.

I think the Rust stuff is going to land really nice, I've already got a clean abstraction for swapping socket implementation for testing(that's totally zero-cost) and I think I can provide a simple API that'll be optionally plugable into futures/tokio to help drive adoption for people who are looking to use high performance IO.

gafferongames commented 7 years ago

To my knowledge it's generally accepted that random nonces at 64bits aren't safe.

From the libsodium docs:

The XChaCha20 variant, introduced in Libsodium 1.0.12. It can safely encrypt a practically unlimited number of messages of any sizes, and random nonces are safe to use.

I agree that a random nonce for the connect token would be much nicer from a user point of view. Something to consider in the future, and maybe worth upgrading to libsodium 1.0.12 to obtain, vs. the current sequence number, which is error prone and easy to screw up.

Back to rust:

I think the key thing that rust can provide is better safety, more performant I/O (C implementation uses sendto/recvfrom), and multithreading. I'm very interested to see where it lands.

cheers

gafferongames commented 7 years ago

Some more info on short nonces here:

https://download.libsodium.org/doc/key_derivation/index.html#nonce-extension

vvanders commented 7 years ago

Huh, not sure why they contradict themselves in the docs, better to err on the side of caution I guess.

Anyway, I've made some small progress on things. I've got ConnectToken completely written in native rust(aside from libsodium-sys but I don't expect a pure rust implementation yet) with this commit. The wrappers worked out kinda nice since it means I'm able to check confirmation with the C API inside the unit tests.

A couple more things came up as I was implementing the token:

  1. It looks like write_u* just does an endian-swap instead of a conversion, I had to use big endian serialization on x86 to be able to read the tokens from the C API cleanly.
  2. If you don't mind I might look into splitting out the network sim/token functions on the C side a bit more so that I can call them from the Rust unit tests. I really like the idea of the Rust tests validating that we can talk to C API based servers.
  3. Is there any reason for going with IPs instead of host names for the connection token? There's some cases where you'll put a geographic load-balancing DNS in front of a host name and when the server has to do the resolution you won't get an IP that's geographically close(think CDNs and the like).

Anyway, I'm trying to carve out time for this when I can. Next big thing is probably going to be getting the server socket setup + decode right. I'll probably see if I can talk to the C Client so I can verify my side of things before implementing a client stack as well.

gafferongames commented 7 years ago

Thanks.

1 - Sounds like a bug. Network order is supposed to be little endian. I'll fix this.

2 - No problem as long as the functions split up but remain in the c file only. setup a pull request and i'll review and accept. good idea to be able to communicate between rust and c for tests like this.

3 - Size mostly. The connect token needs to fit inside a conservative 1200 byte packet (MTU). I think max servers is set to 32, and I don't see 32 fqdns fitting easily inside the token (with the rest of the data), in all cases. Maybe the token has a list of IPs or one FQDN instead? That's doable.

Sounds good and thanks for working on this. I like how you can test the server, and run it against the C client to make sure it works. Helps avoid the chicken and egg (is the server broken, or the client?) which is always the painful thing when writing a client and server at the same time.

I guess fixing up the endian order to little endian is important, so I'll work on that now.

cheers

gafferongames commented 7 years ago

I've worked out why the docs are different:

This page describes the secret key box encryption:

https://download.libsodium.org/doc/secret-key_cryptography/authenticated_encryption.html

And it has crypto_secretbox_NONCEBYTES, which is 24 bytes, this is large enough for a random

The other pages describe the AEAD primitives, which have 64, 96, and the extended nonce. (chacha20). These are the crypto primitives used for the connect tokens.

That's why there is a difference in the docs. Not safe to use random nonce for the 64 or 96 bit AEAD, but safe with the new chacha20 primitive.

cheers

vvanders commented 7 years ago

Ah, that explains the difference. I wonder if there's not a big perf hit if it makes sense to move to XChaCha.

Sounds good, the endian thing is pretty easy to switch(Rust has a nice zero-cost ByteOrder crate that saves the trouble of all the bit-flipping).

I like the idea of one FQDN or raw IPs, seems like enough flexibility(and if they need more they can handle it when they give out the token). How would you feel about keeping everything as-is right now and then once I've got a working server/client we could consider a v2 token/etc? There might be a couple other design changes that come up which would be easy to do as just one large item.

gafferongames commented 7 years ago

Lets keep a list of things to improve, but keep these for a 1.1 version. Lets just get the current version fixed up, documented as a standard, and working across a number of languages. It's functional right now, and users will let us know what are the most important things to consider for 1.1.

gafferongames commented 7 years ago

OK. Everything should be little endian now.

vvanders commented 7 years ago

Awesome, sounds like a plan.

I've been digging into mio but they don't provide send()/recv() for UDP. I was thinking of using connected sockets so I can use completion IO to be able to parse work out to other threads. I just got one PR merged in miow and hopefully should be able to get the rest into mio from there and whatnot.

How strongly do you feel about being able to decode/process off-thread if users want to? It shouldn't be too tricky to work out as an opt-in feature. I'm thinking some sort of future.rs based API where the base just handles it in-line but can handle callbacks on completion for threadpools or the like.

gafferongames commented 7 years ago

I don't have strong opinions. Please implement the way you see as best fit for the language

gafferongames commented 7 years ago

Hi, I'm continuing work on the spec today. Let me know if there is anything I can do to help your effort on the Rust port.

cheers

vvanders commented 7 years ago

Good stuff.

Sorry I've been a bit absent, it's a combination of day job plus heading down a few dead-ends(I should have checked before I blindly assumed Serde would serialize large fixed-size arrays, it doesn't).

On the plus side I almost have connected UDP sockets merged into mio and I think I'm going to be close to getting a rough server here in the next week. Sadly there's not really any separable bits on the Rust side yet, once some of the server stuff is in place some more there might be if you want to get your hands dirty.

gafferongames commented 7 years ago

No worries! I appreciate your help. I'm going to keep pushing on the standard and see if I can finish a first draft today. cheers

gafferongames commented 7 years ago

I've managed to document the connect token data formats, and I made a few changes.

First, private connect token data is zero padded to 1024 bytes on write (for consistency), second, the connect token data buffer size was reduced from 4096 to 2048, third the # of server addresses per-token was increased from 16 to 32.

https://github.com/networkprotocol/netcode.io/blob/master/STANDARD.md

cheers

gafferongames commented 7 years ago

Mostly finished documenting token structures and packet structures, and a first pass of the client connection process completed today.

I made another small change, which was to reduce the challenge token buffer size to 300 bytes.

cheers

vvanders commented 7 years ago

Nice, that's pretty solid docs. I'll def do a pass on what I have and make sure it lines up. I've got packets serializing now(including a really nice way to safely do the variable byte len sequence value). I'm hoping soon that I can talk to the C client.

gafferongames commented 7 years ago

Yes it would be really helpful as you implement if you can compare your understanding, vs. the spec, vs. the reference code and if you notice any mismatch, or anything important that is not covered in the spec, let me know and I can fix it.

Up next I'll be working on documenting the server-side connection process, which I think is the most complex part, it's already somewhat documented here:

http://new.gafferongames.com/post/why_cant_i_send_udp_packets_from_a_browser/

gafferongames commented 7 years ago

I made another small change. The only reason for "connection denied packet" is server is full, so I removed the redundant reason enum. Connection denied packet doesn't have any per-packet type data now.

vvanders commented 7 years ago

Yup, that seems reasonable and is trivial for me to adjust.

gafferongames commented 7 years ago

I will try to make no more changes from now on. We should lock down 1.00, unless I discover an actual bug or something really egregious :)

vvanders commented 7 years ago

Yeah, sounds good.

Small progress on my end. Have all packet types serializing/deserializing internal to Rust.

I've integrated bindgen so I can now call into the functions as well as headers directly from Rust. I'm going to use it to do piece-wise testing of the C/Rust interop rather than one giant integration test.

gafferongames commented 7 years ago

Very nice

vvanders commented 7 years ago

Yeah, I've actually got it running against the C API right now with this commit (https://github.com/vvanders/netcode-rust/commit/de180a4abb2c2c319aa94b2aa18a7d2681340f97)

One thing that's tricky is some of the prointers aren't marked as const so a couple rust vars are a little unsafe(I have to mark things that are mutable which aren't). However I can't remember if C supports it cleanly or if that was a C++ extension into C. I couldn't get MSVC to cleanly compile it when I added const to a few headers.

Anyway not a big deal. Right now the tests are failing but I'm hoping in the next day or two I can sort out differences between Rust and C to get them inter operating cleanly. Nice part about this is the changes to the C API will fail the Rust unit tests so we can be sure everything is in sync.

vvanders commented 7 years ago

Finally have packets being read on the C API side, found a small bug in the process. It looks like the compressed sequence was pulling in bytes reversed so that we'd get a u64 that wasn't either little endian or big endian. I've got a quick fix on my end:

@@ -1673,7 +1673,7 @@ void * netcode_read_packet( uint8_t * buffer, int buffer_length, uint64_t * sequ
         for ( i = 0; i < sequence_bytes; ++i )
         {
             uint8_t value = netcode_read_uint8( &buffer );
-            (*sequence) |= ( ( (uint64_t) value ) << ( 8 * i ) );
+            (*sequence) |= ( ( (uint64_t) value ) << ( 8 * (sequence_bytes - i - 1) ) );
         }

However we'll probably want to do a pass on the rest of the lib, I haven't tested what I broke with the C API yet.

Also still don't have the connection packet decoding yet but I think I'm close.

gafferongames commented 7 years ago

I think the reversed bytes are by design.

Here is the relevant part of the spec:

"The sequence number is encoded by omitting high zero bytes, for example, a sequence number of 1000 is 0x3E8 in hex and requires only three bytes to send its value. Therefore the high 4 bits of the prefix byte are set to 3 and the sequence data written to the packet is:

0x8,0xE,0x3 // sequence bytes reversed for ease of implementation"

Does this match what you are seeing?

cheers

vvanders commented 7 years ago

Yeah, that looks like what I'm seeing. The issue is that if this were little endian they would be shifted from the top of the byte array. Right now they're shifted from the bottom so I can't use any integer representation of the nonce. Both little endian and big endian don't make a byte pattern that matches.

The code I have above works for big endian(I'm still on that since I don't want to pull latest changes until I have something working here, which I'll probably do soon once I have the connection packet working).

Is there any chance we can change the implementation to use a full little endian representation? I'm also fairly certain that the code would behave differently on big endian platforms.

gafferongames commented 7 years ago

The problem is I'm trying to solve here is to avoid sending the top 7 bytes in the uint64 when they are zero, which is common because sequence # starts at 0 and increases. This saves bandwidth.

I don't think the reference C code as written is different between big and little endian because it is working on the integer representation in a register, which is the same on both big and little endian.

So what I recommend is that the rust code you write to read back the bytes in the current reversed order, and |= them to the uint64 representation in register, like the C code does, and it should work fine.

The reason I recommend this is if I try to write the bytes in a little endian order, it gets weird, because little endian is reversed order in memory, so instead of dropping the high bytes in memory, its now dropping the low bytes in memory, and people's heads explode when they read the spec. "WTF am I doing here exactly?"

So I think this is the easiest way to do it, and we should stick with the spec.

cheers

gafferongames commented 7 years ago

Can you give me more details on this:

Yeah, that looks like what I'm seeing. The issue is that if this were little endian they would be shifted from the top of the byte array. Right now they're shifted from the bottom so I can't use any integer representation of the nonce. Both little endian and big endian don't make a byte pattern that matches.

So I understand what the implementation problem is on the Rust side.

gafferongames commented 7 years ago

As I read it, it seems that you are using a function to convert a byte array to an integer representation and you're not getting what you want out of it?

If you change the order of how you read the bytes in, I think you can treat the byte array as a big endian value and it will work, for example:

We write sequence 1000, which is 0x3E8 in hex:

It's reversed in memory:

0x8,0xE,0x3

should recover in the byte array to:

0,0,0,0,0,0x3,0xE,0x8

Which you should be able to convert to an integer representation treating it as a memory representation of big endian which it is.

This is what I expect the C code as written is doing. If it's not doing this, then that's a bug.

Let me know if this is or isn't what you are seeing.

cheers

gafferongames commented 7 years ago

At this point I think I might have screwed up the decode for the sequence # in the C implementation.

Let me know and I can fix it.