status-im / nim-chronos

Chronos - An efficient library for asynchronous programming
https://status-im.github.io/nim-chronos/docs/chronos
Apache License 2.0
353 stars 51 forks source link

Add `sock` to stream connect parameters #344

Closed diegomrsantos closed 1 year ago

diegomrsantos commented 1 year ago

In order to do hole punching, a peer must be able to bind to the same port as it is listening to during the dialling synchronization.

cheatfate commented 1 year ago

Now everything looks good, the only issue i see is that its possible to create socket which is bound to IPv6 and call connect with IPv4 address. I think we should get some work done on this case. There actually 4 cases, but the most interesting only 2:

  1. socket is IPv4 and address is IPv6 - we should check if address is IPv6 mapped address and if so, we should unwrap it and obtain IPv4 address and connect to it, or we should just raise specific exception (maybe Defect or maybe any other specific error based TransportError).
  2. socket is IPv6 and address is IPv4 - we should transform address into IPv6 mapped address and try to establish connection with it, or we should just raise specific exception (maybe Defect or maybe any other specific error based TransportError).

Why i think this is important, because different OS will start generate different TransportOSError which could become confusing at some cases and its why i think we should avoid it.

cc @arnetheduck @Menduist @diegomrsantos

Menduist commented 1 year ago

@cheatfate good point, I will take a look. Maybe you could add some helpers to #354 for this, to check if IPv6 addresses are in the "ipv4 mapped range", and helpers to switch between the two?

cheatfate commented 1 year ago

This helpers are already in there

toIPv6() https://github.com/status-im/nim-chronos/pull/354/files#diff-2e3fd49bf3dd51f40ff2993b220dd88832cbc24aa4f70d543febf19857eecebbR186

isV4Mapped() https://github.com/status-im/nim-chronos/pull/354/files#diff-2e3fd49bf3dd51f40ff2993b220dd88832cbc24aa4f70d543febf19857eecebbR207

toIPv4() https://github.com/status-im/nim-chronos/pull/354/files#diff-2e3fd49bf3dd51f40ff2993b220dd88832cbc24aa4f70d543febf19857eecebbR221

Problem is, what should we do? Do we need to raise an assert or we should perform some automatic actions...

Menduist commented 1 year ago

Indeed, I missed them, sorry!

There is actually 3 place where the domain needs to match: createNativeSocket bindSocket connect

If there is a mismatch somewhere, it will raise (22) Invalid argument [TransportOsError] on linux, or (1214) The format of the specific network name is invalid. [TransportOsError] on windows

Menduist commented 1 year ago

@cheatfate anything else blocking here? Do you want to create a specific TransportOsError exception and check for domain mismatchs in bindSocket and connect?

arnetheduck commented 1 year ago

The greater problem, which somewhat goes in hand with ipv6 support, is the ownership question: after passing in a random socket, who is the owner of that socket?

This has an effect on chronos in several ways, but above all, which assumptions chronos can make about it: is it blocking? is it dual stack? is it bound? what's the buffer size -> this in turns affects the API that chronos offers, or how "thin" of a layer it is.

If a user passes in a socket which is blocking, should chronos force it to be non-blocking? This is a requirement for chronos to work, but whose responsibility is it to set the options? Should we instead expose some other socket-creation API that ensures that the socket has been created appropriately? Who closes it? Should there be a way to release the socket back to the application? what happens with in-flight requests?

Before merging this PR, this architectural question should be answered, else we end up with an API that may become unsupportable.

Menduist commented 1 year ago

Note that this api type isn't new in chronos, it's based on the datagram transport that already allows to pass in an existing socket https://github.com/status-im/nim-chronos/blob/77ffff322c593a0909166691c65e58cbbcd92170/chronos/transports/datagram.nim#L652

cheatfate commented 1 year ago

After a long discussion of this PR with @arnetheduck, he points my attention that this PR has significant flaw. And it is my fault that this flaw appears (createServer() definition). And i think because of this i misguided you about same functionality for connect.

The problem i'm trying to describe is owner transfership.

When chronos creates the socket it could manage it and definitely should close it by itself. Also its possible to perform any operation on every OS chronos supports.

When application creates the socket and transfers this socket to chronos via createStreamServer() or connect() - chronos should use it without any modifications and definitely should not close it. I'm trying to keep such logic everywhere in chronos, but looks like i have missed this specific case. The problem become more visible on Windows where we can't just unregister socket from IOCP queue handle, the only way to unregister handle/socket from IOCP is to close it.

So the only way to go further is to add all the options and addresses into the connect call and remove socket argument/option from both connect() and createStreamServer().

arnetheduck commented 1 year ago

Relevant to this PR as well: https://github.com/status-im/nim-chronos/pull/25

Re ownership, we could indeed introduce that as an "official" API, meaning that chronos takes over ownership and becomes responsible for closing later on: it would also solve the problem of socket options where chronos would set all the options it needs on the socket.

The key point would be that this needs to be made consistently clear in the API in some way: naming, sink, a special "transfer" type like unique_ptr in C++ etc - not sure what the best approach is, but as long as it is done consistently for all API, it's a possibility to explore.

The lack of solid ownership tracking in Nim will indeed open up the possibility for users to abuse ownership transfer (for example by using the socket after transferring ownership) - this is a general problem however and as long as the method we choose is well-documented and explicit, this risk is acceptable.

Menduist commented 1 year ago

362 got merged instead