libp2p / specs

Technical specifications for the libp2p networking stack
https://libp2p.io
1.56k stars 273 forks source link

connections/README.md is too biased and opinionated for library implementations #238

Open ShadowJonathan opened 4 years ago

ShadowJonathan commented 4 years ago

Currently, connections/README.md contains language that's specific to an implimentation of a libp2p library, go-libp2p. In general, to my view, the document also fails to address that Streams are essentially still "connections" to protocols.

The document also fails to address that multistream-select is a protocol that's used to allow for a more efficient protocol selection process, beyond the default behaviour (without multistream), which to my memory is; send/receive a null-terminated protocol byte-string, and when the other node repeats this string, the protocol is agreed upon, if not, the connection is closed. This is something that the document also fails to address, or at least fails to mention.

"Upgrading" connections is a strategy that a client can apply to the connection that has been established in between two nodes, however, this process is completely optional and non-binding per the definition of libp2p: modularity. The fact that this is indeed optional, or non-binding, is something that the document fails to address.

Lastly, at the end there are still some leftover concerns for the golang implimentation of libp2p. In my personal opinion, these issues/concerns have no home in a general spec document, and should instead be raised in their respective libraries (go-libp2p, in this example)

Please note: I am new to the libp2p ecosystem in regards to discussion and history of its development, but however I saw some issues with this document when it was handed to me to read it, on how to impliment it in py-libp2p. These are all personal thoughts and issues that I've gotten during reading of that document, please correct me if any of my assumptions in these comments are simply false, or if my opinions are unfounded.

Stebalien commented 4 years ago

:+1:

send/receive a null-terminated protocol byte-string, and when the other node repeats this string, the protocol is agreed upon, if not, the connection is closed.

That's basically just a single round of multistream select. Well, that's the mutlistream part of multistream select.

Otherwise, you're completely correct.

ShadowJonathan commented 4 years ago

i'm talking about the default multistream-select-less behaviour, the state and logic of a connection before that protocol has been activated, which is still something that needs to be mentioned.

Unless multistream-select will be baked into any default libp2p connection, the document still needs to address and define that/a default behaviour;

if initiator:
    send null-terminated protocol string
    wait for string echoed back, if malformed, close connection
    on connection close, retry via internal strategy logic
else:
    wait for bytes
    when null-terminator received, query protocol with received string from internal register
    when not found:
        close connection
    else:
        repeat protocol bytes with null-terminator
        further traffic is protocol-specific

(note, this is how I had it in my head for a while, please correct me if this is not the intended fundamental behaviour)

ShadowJonathan commented 4 years ago

@Stebalien Can I maybe get any acknowledgement on this? It's been 6 months, and in those months I've only seen proprietary protocols be approved and discussed on this repository, while this is an absolute core part of the spec which must be resolved before other parties can reliably work off of it.

Stebalien commented 4 years ago

Sorry, this fell off my radar. Please do bump conversations like this when you don't get a response.

Unless multistream-select will be baked into any default libp2p connection, the document still needs to address and define that/a default behaviour;

multistream-select is a core protocol in libp2p and is required (for the moment, at least). Really, there are three core components that define libp2p:

Whenever a transport hands control of a stream of bytes to libp2p, libp2p will use multistream-select to select a protocol on that stream of bytes. In theory, the transport could tell libp2p to use a different protocol, but that would require a refactor of all implementations and some significant design changes.

i'm talking about the default multistream-select-less behaviour, the state and logic of a connection before that protocol has been activated, which is still something that needs to be mentioned.

Could you expand on this?

ShadowJonathan commented 4 years ago

My assumption is that the state machine of a libp2p transport is without multistream-select first, basically being in a state of "when received bytes, echo bytes if bytes corresponds to protocol string, protocol starts after echo, otherwise disconnect", basically that codeblock I already posted:

if initiator:
    send null-terminated protocol string
    wait for string echoed back, if malformed, close connection
    on connection close, retry via internal strategy logic
else:
    wait for bytes
    when null-terminator received, query protocol with received string from internal register
    when not found:
        close connection
    else:
        repeat protocol bytes with null-terminator
        further traffic is protocol-specific

This means that the first ever state that a transport connection can be in is a non-multistream-select one, a "default" state which multistream-select then gets built upon as a protocol onto itself, which then negotiates further protocols efficiently and intelligently.

This is never pointed out in the docs, and the actual semantics and soft requirements of the default first state locks it into multistream-select, making it both unnecessary and locked in.

If when first connected to a libp2p transport really does not start with multistream-select, then the docs should reflect that, and provide semantics about how to handle connections like that (receiver does not send bytes, waits for sender to start sending first protocol bytes, if nothing is sent after 5/10 seconds, receiver is allowed to send multistream-select bytes for backwards compatibility (or something)), these are semantics, not actual properties of libp2p, libp2p's default initial transport state should not be a legacy bridge for multistream-select, but instead treat it like any other connection: waiting for simple protocol negotiation, and close connection on unavailability of requested protocol, initiator to receiver.

ShadowJonathan commented 4 years ago

I see multistream-select as a libp2p protocol into itself, but the document should prepare the stage for connections, and describe it's properties instead of defining "common semantics", if the absolute core of the library cannot be described easily, then libp2p cannot be abstracted efficiently, and it's interoperability muddied, and it cannot act as a pure platform for every other p2p solution out there.

(Sorry if this sounds very obsessed or specific to this issue, but I just think that documentation like this should be updated with more clear and core logic of libp2p, instead of recommended semantics, like the difference between MUST and SHOULD on RFCs, for example)

Stebalien commented 4 years ago

My assumption is that the state machine of a libp2p transport is without multistream-select first, basically being in a state of "when received bytes, echo bytes if bytes corresponds to protocol string, protocol starts after echo, otherwise disconnect", basically that codeblock I already posted:

That "echo back" protocol is part of multistream, that's why it's not mentioned separately in the docs. When a libp2p transport hands off a byte stream to libp2p, the first protocol spoken is implicitly multistream.

Basically, we need to start somewhere. There needs to be some base protocol for specifying the next protocol we're going to speak.

ShadowJonathan commented 4 years ago

Yes, but that protocol might be replaced or upgraded in the future, to muddy the waters and make both sides echo the string at once could have one of them break the connection because it is receiving a "malformed" connection string. My question and request is for this to basically be a "normal" protocol selection environment, and not an exclusive bridge to multistream-select. Maybe there will be a multistream-select v2, maybe another company would want to implement their own multistream-select, maybe there's a problem with the current multistream-select which needs to be fixed, but cant due to the large already-existing implementation which automatically "locks in" the connection immediately to multistream-select v1 (because both sides need to immidiately hold off on pre-emptively sending that string).

So that part needs to be abstracted and documented as a separate state in the internal connection state machine before multistream-select comes into play, no matter how closely glued and linked the two are, it's still a stage in the connection handling that isnt directly multistream-select, and instead starts that protocol, so it needs to be documented and cleared up so that library implementations can interoperate even on the lowest level, that the whole stack can be replaced if so desired.

This is just my first issue, but I also want to re-emphasize the other issues I noted in my first comment; modularity and library agnosticy, this repository does not serve to either one of any libp2p implementation, and instead lays out specification, kinda like RFCs, so it should not link itself to any one programming language paradigm, and instead serve a completely abstract and logical description of how libp2p operates conceptually, and then leave implementation up to individual libraries or other implementations. Those are my views on this manner, and I hope this is also the case for the rest of Protocol Labs, because otherwise this repository is not a specification, but a "design document collection" of just a few select libp2p implementations.

@vyzo @yusefnapora

raulk commented 4 years ago

@ShadowJonathan your comments are welcome. Indeed, my position is that connection bootstrapping and protocol selection are two very distinct mechanisms. In current libp2p, they are both mediated by multistream-select v1. In a near future iteration of libp2p, I am advocating for separating them and establishing a dedicated connection bootstrapping flow. That flow would include the upfront exchange of connection and stream “capabilities”. Such capabilities may be: multiplexers, compressors, erasure coding schemes, AND protocol negotiation schemes, out of which the default one would be multiselect v2 (future non-naive, efficient stream protocol negotiation scheme). This would be the right extension point to introduce implementation/app-specific protocol negotiation schemes, that take priority over the default, fallback multiselect v2, i both peers support them.

ShadowJonathan commented 4 years ago

How would this be backwards compatible with the current connection method in place? Or will this be a hard cutoff? Or maybe a hybrid of the two where v2 will wait RTTx2 before engaging in this upgraded mechanism (to not have collisions in-flight with the v1-sent multistream-select protocol string)

In any case however, maybe this could be modularized further by having a "main header" followed by "type headers" + arrays, which are integers followed by null-terminated or length-prefixed strings for compactness.

This "main header" (a string) could allow many different "type table" types to be recognised (for instance, libp2p defines their own type table (integer -> type) to be used, and another company can make one of their own for their own usecase, the framework will be the same, but the difference can be for highly specialised circumstances).

Example:

"lp2p"
<1> (mplex)
/p2p/mplex/1.0.0
/p2p/mplex/2.0.0
<2> (swarm)
/p2p/swarm/1.0.1
/eth/exchange/0.5
<4> (security)
/p2p/tls/3
/p2p/secio/6.7
<99> (selection)
/p2p/multiselect/2

Sorry to instantly write out a draft like this, but I think the methodology behind this would be to make this as extendable and "open" as possible for completely seperate and self-built stacks, and have libp2p be a neutral agnostic framework, for which basic and default semantics are provided to work "in general", but it's power can be utilized for any project/organization to build its own stack (from self-made protocols or existing ones) and still reliably interconnect with the existing network, imo that's the strength of libp2p, and it should hold onto that to give anyone the power to use its keys, while not rubbing shoulders.

God knows how many seperate bottom-level p2p protocols exist nowdays, it would be a dream for every one of them to build, interconnect, and interoperate like Legos, having the resilliency and usability of universally-existent protocols provide a value greater than its sum to every node in the network, but keep the specificy and "selectability" of ultra-specific protocols between any 2 or more nodes in that network to serve its usecase. That's my dream.

ShadowJonathan commented 4 years ago

One more idea I came up with just now is that this new "multistream-select V2" or Type Table method would still come from a basic protocol, but a small-sized one to save bytes (/core/typetable/1), that the idea of "first-order-of-business protocol negotiation" isn't broken, but that this could become another efficient and widely used protocol as well, and to have libp2p libraries that support that actually adhere to the "wait if receiver" policy, and not immidiately send out a string from both sides that locks the network into only one first-order negotiation protocol.

To have newer versions of libp2p implement that alternative, or at least adhere to holding off on sending strings if it's the receiver end (by specifying it in this document) could be a huge step forward to bring in some additional ideas and opportunities.

Stebalien commented 4 years ago

Yes, but that protocol might be replaced or upgraded in the future

I agree, but we need to start somewhere. At the moment, that somewhere is multistream. It sounds like you're proposing an alternative protocol declaration/negotiation protocol.

As for upgrades, the current proposal is to allow a transport to specify the starting point. That is, a transport could specify which protocols the peer speaks and the protocol selection protocol that should be used. The transport would learn this information during the bootstrap process (e.g., security handshake) as @raulk describes.

ShadowJonathan commented 4 years ago

It sounds like you're proposing an alternative protocol declaration/negotiation protocol.

I don't, the original intent of this issue is to have the document clarify and separate the initial protocol negotiation state from multistream-select, and open up that state for more than just multistream-select. I just got caught up in suggesting some ideas i got after raul said something about an alternative way of negociating. So that part is separate from this issue.

My only real demand is for the (softly mandatory) "both sides send a string" semantic and recommendation to be dropped, and for it to be mandatory that the receiving end waits for the connection initializer to send a protocol string. This way, the earliest stage of the connection can be altered to fit a stack in the future.