As witnessed in https://github.com/libp2p/rust-libp2p/issues/3244#issuecomment-1352743020 adding a new protocol to the multiaddr implementation breaks communication between nodes using the new feature and nodes using an older version of this library. This would be warranted if understanding the meaning of a multiaddr is always mandatory, but there are cases (like the identify protocol) where that is not the case.
Breakage for such cases could be avoided by adding a new layer of validation: besides syntactically invalid and fully understood there could be a class that is syntactically valid but not fully understood.
Due to the design of multiaddr syntax, this is not a trivial question: a protocol segment may have arguments, like /tcp/1234, and without understanding the protocol name it is impossible to know the number of arguments. It would have been possible to choose different separator characters (like /tcp=1324 or some such), but that ship has sailed. So the only way to express syntactically valid but not fully understood addresses is to add a variant like Protocol::Unknown(Cow<'a, str>), where /tcp2/1234 would lead to two unknown segments (with tcp2 and 1234 payloads, respectively).
The alternative to handling this in this library is to always deserialize a multiaddr property as String first and then check whether it can be fully parsed if needed. However, given that multiaddr aims to offer an abstraction over various addressing schemes, I think it is reasonable to expect that this scheme itself is extensible and handles extensions in a graceful fashion.
As witnessed in https://github.com/libp2p/rust-libp2p/issues/3244#issuecomment-1352743020 adding a new protocol to the multiaddr implementation breaks communication between nodes using the new feature and nodes using an older version of this library. This would be warranted if understanding the meaning of a multiaddr is always mandatory, but there are cases (like the
identify
protocol) where that is not the case.Breakage for such cases could be avoided by adding a new layer of validation: besides syntactically invalid and fully understood there could be a class that is syntactically valid but not fully understood.
Due to the design of multiaddr syntax, this is not a trivial question: a protocol segment may have arguments, like
/tcp/1234
, and without understanding the protocol name it is impossible to know the number of arguments. It would have been possible to choose different separator characters (like/tcp=1324
or some such), but that ship has sailed. So the only way to express syntactically valid but not fully understood addresses is to add a variant likeProtocol::Unknown(Cow<'a, str>)
, where/tcp2/1234
would lead to two unknown segments (withtcp2
and1234
payloads, respectively).The alternative to handling this in this library is to always deserialize a multiaddr property as
String
first and then check whether it can be fully parsed if needed. However, given that multiaddr aims to offer an abstraction over various addressing schemes, I think it is reasonable to expect that this scheme itself is extensible and handles extensions in a graceful fashion.