ssbc / ssb2-discussion-forum

not quite tiny, also not quite large
17 stars 1 forks source link

My 2 cents #7

Open gpicron opened 1 year ago

gpicron commented 1 year ago

Generally speaking, I think the simplicity is in the separation of concerns.

Some random though

  1. One of the weakness of SSB today is the fact of relying on long term key pairs. In a few years, most of the crypto algorithm used today will be obsolete (if they are not yet). The only real way to fight against that is to have the ability to rotate often keys, especially when they are used often. For instance, we should use different keys for network peer identification and feed writer signature.

  2. What if instead upon creating a new Feed, the root message would be a Feed Metadata Bundle containing:

    • creator pub key (optional if not the writer)
    • writer pub key (mandatory, the one that will sign subsequent messages in that feed. Unique)
    • truncation policy (optional, specify how and what to delete over time, separate specific per application type )
    • security policy (mandatory, specify how and what make a message considered valid for instance making backlinks optional for that feed. )
    • free key-value pairs for application specific needs
    • signed by the creator (if any) or the writer The creator pub key is a kind of backlink to some owner key pair which is used only for signing. Actually, from the logical point of view, this is like a message of MetaFeed but loosely coupled. The identifier of a feed is the hash of that message so that it can be shared freely even if it is encrypted for private group.
  3. I think it is mandatory to layer the protocol. If you want to merge all for SSB2, no problem for me. This will be a Social Media protocol on TCP. I misunderstood the intent, I will stop here on SSB2 and continue on my path. It may be also possible that I misread the doc.

  4. For me, we must think in layers and concerns of each layer and responsibility and try decouple them as much as possible, then focus on efficiency of each. There are

    • a logical model of what is a Feed, a Message, a Blob and the security guaranties we want to give it. I think it is important to maintain the event chain integrity guaranty else this is offering less than Nostr or Bundle Protocol. That said, it may be relaxed for some case by the Application layer and interpreted by the application depending of the message. Meanwhile, I think it must at the low level of the protocol stack because that can inform the routing (EBT or whatever) how to optimizes the packet scheduling for the application. So I would propose in the "metadata" part to not impose a "previous" link, but instead allow the application layer to specify 0 or more "backlinks". Depending on the context, that can be a link to the Feed Metadata Bundle (the root of the feed), some (1 or more) previous message(s) in the feed, some message(s) in other feeds (tangle). This is sufficient to a EBT like routing layer to perform an efficient out of order scheduling of sendings. This is also sufficient to express update, deletes and CRDT chains by the application layers and ensure eventual consistency.
    • I don't see any reason to impose a serialization format for the payload, the lower layers should never try to decode it, this is purely application layer protocol.
    • the wire transport of data blocks (message, blobs, etc.). The model of packets/segments of data can be different than the canonical model of Messages. There is plenty of technics to make it more efficient at byte level by reducing redundacies. But that depends on the underlying Transport protocol. Session context shared atom dictionary (like in Erlang), Context Multiplexing (like in TinySSB), etc. Of course, while designing those layers, we must take into account the resources of the device on which it will run and search the good trade-off. Messages transformations based on context from storage to wire cost CPU but reduce the bandwidth. Compact serialization format on disk are not necessarily efficient for querying and may cost more by requiring a lot of additional index structure. But the format on disk may be inefficient on wire. There we can play in the implementation with levelling of storage. Store recent messages that have a high proba to be exchange through wire in format do not cost much in transformation and progressively transform in background the message in query efficient formats. In summary, one size does not fit all, I don't think the logical decomposition is more important today in the exercise than the bytes.
  5. standards: It is interesting to define new "standard" when it brings a meaningful added value. But... It takes a lot of time to derive a good specification, a reference implementation, a consensus and a kind of recognition that permit interoperability.

    • bipf: what is the added value with regard to CBOR, Protobuf, etc. for instance ? That's a "standard" that live only in the SSB world. How would you sell it to another dev that is not working on SSB ?
    • ssb-bfe: again something that exist only in SSB world, difficult to extends. The intention was to have self-typed block of bytes. libp2p has defined a whole range of consistent multi-format with clear specifications to express hashes, signature, etc with the principle of a binary efficient format and a deterministic "uri" counter part. I think it would make more sense to switch to that standards
    1. encryption: The fact a message is encrypted and encryption/decryption logic is somewhere a service that should available to application layer as a service in term of SDK. But at routing and transport layers, that may be useful to know that a message payload is encrypted (for instance, the routing algo may be configured to allow sending encrypted packet over non-secure transports. It may be considered as safe to broadcast encrypted messages over UDP or over LoRa while for "clear text" message authenticated connection with a trusted peer may be required).

Sorry this is a bit random. But the most important for me is that I want to clarify the direction you want to take for SSB2 elaboration to determine how much time I will personally invest in SSB2 effort.

cblgh commented 1 year ago

I think it is mandatory to layer the protocol. If you want to merge all for SSB2, no problem for me.

@gpicron can you expand on the above?

fwiw i regard the contents of point 5 as opinions (bipf/ssb-bfe are not intended as external standards).

and so here's my opinion: if the dependencies in particular are core to your project and easy enough to implement from scratch then it doesn't matter that they aren't industrial-grade. sometimes this even helps adoption, at least when it comes to niche environments (lora being one such niche); uxn is a very good recent example of how easy portability causes software to spread across unlikely environments.

that said i agree that it could be interesting to re-evaluate at this stage: are these dependencies core pieces or can they be simplified somehow?

gpicron commented 1 year ago

@cblgh most of I wrote is my subjective opinion. I don't pretend to have have the truth ;-) So comment, disagree with me. No problem.

I was not meaning that those SSB only standards have no value. At some point they were created and explored. Exploration is key for innovation. But as we talk about of building SSB2, with the intention to have clear and documented specifications, I think we should re-evaluate their added value with regards to existing, mature, documented and largely implemented and tested existing one outside SSB world. That avoid us some work and give us more time to focus on what make more value in SSB.

Uxn is a excellent example. Re-exploring what seems to be acquire with another point of view, but the inventor also explored also as much as possible of what was invented in the past to take the lessons of it. That why I believe so much that SSB has principles to redefine the way digital information is own and diffused and ultimately replace TCP/IP and Unix socket model which implies server oriented architecture and centralisation.

SSB is general exploration of that. We learned a lot from it. Now we are thinking about a new generation, we must take the lesson learns. One is the non-clearly specified protocol is reducing the adoption. Another is that we are not enough taking I think the lessons learned from other explorations. For instance, space industry has to deal with long delays, intermittent, low bandwidth, opportunistic and secure data transmissions between low cpu power devices from the beginning, they slowly specified and matured a standard, DTN BP7, since 2003. Actually, if you take the time to read the specs, look at some video and implementation, you will find that they implement 80% of the data diffusion model that SSB offer. Building on top of DTN architecture would reduce our work by 80% and let us focus on the added values.

staltz commented 1 year ago

One of the weakness of SSB today is the fact of relying on long term key pairs. In a few years, most of the crypto algorithm used today will be obsolete (if they are not yet). The only real way to fight against that is to have the ability to rotate often keys, especially when they are used often. For instance, we should use different keys for network peer identification and feed writer signature.

I am somewhat favorable of doing this, but I think it's not a pressing problem. And if/when it becomes a pressing problem, we can launch new software that will then create a new feed with the new keys. This is tough, because on one hand we are currently experiencing storage sustainability problems because some 6 years ago Dominic probably thought that ever-increasing feeds are "not a pressing problem". On the other hand, there is only so many problems we can solve at once, that if we aim at solving all of the imaginable problems in the future, we will make little progress in any of them. So what I need to do is map all these problems and try to classify them by importance.

  • I think it is mandatory to layer the protocol. If you want to merge all for SSB2, no problem for me. This will be a Social Media protocol on TCP. I misunderstood the intent, I will stop here on SSB2 and continue on my path. It may be also possible that I misread the doc.

  • For me, we must think in layers and concerns of each layer and responsibility and try decouple them as much as possible, then focus on efficiency of each.

I understand the separation of concerns you want, and I like that too. BIPF encoding in minibutt is kind of cheating with ssb-db2 because that's the database encoding we use. Arj loves performance, so it might be tough to sell this separation of concerns. I don't mind BIPF that much, because it's a solved problem for us, but it may indeed be a minor obstacle for other implementors. This doesn't sound like a big problem to me, just a minor problem. Again, I might have to take the time to classify problems by importance.

That said, if we limit the size of each feed to 100MB max, and if replicate AOT only followed feeds, then performance is not that important anymore, because scale is so small. In that sense, we could relax performance tricks and aim for simplicity of implementation. I think JSON is fine. I think protobufs are fine. That's for the transport layer, of course. We can continue to use BIPF in ssb-db2 because it's optimized for reads of hierarchically-structured records.

standards

Some time ago cryptix recommended to replace Secret Handshake (SHS) with Noise. I was against it only for the reason of avoiding a breaking change in the network, thus a community split. But other than that, I'm favorable of using state of the art tools and standards.

But it's not so simple, we already have SHS working perfectly well in the (current) SSB stack, and using Noise would not be a matter of just plug-and-play, it would require re-adapting several parts of the SSB JS stack. It's not a clear win, because with SHS / muxrpc / secret-stack / BIPF / BFE etc we can just keep on using them in production because these parts are actually not broken, and thus they are a "problem" that rank very low on my "importance scale" (I should really make this list of problems). We have to be very careful not to let SSB2 become "let's reform absolutely everything about SSB" because that's not the original motivator, and the more goals we add, the thinner we spread ourselves, and the lower our chances of making a change for real users. So we'll need a really moderate approach here where we fix what's really broken, and take the opportunity to add some "low-hanging fruit" breaking changes.

arj03 commented 1 year ago

Thanks for your input @gpicron, really appreciated.

What if instead upon creating a new Feed ...

This then becomes very similar to meta feeds, and there is a lot of complexity in such a model. We did model some of this key seperation including network identity and fusion identity.

I think it is mandatory to layer the protocol

I think this should be more clear in the document, right now there is a bunch of stuff related to replication as well that ideally should not be here, but for now are here just to make things a bit more clear of how this could be used in practice. I'm not against separating the layers. I just need to experiment a bit more to get a better idea.

Also I very much agree on the don't reinvent the whole thing. There is value in that, but that is a much larger project and I'm not sure if we have the resources for that. At least I don't. There are some things from SSB that works really well, like rooms. It would be a shame not to leverage some of that existing code and infrastructure. Also it's a way to scope this. That should be clear from the get go of the protocol. I'm of the opinion that the protocol layer are sort of "easy" part in that you can invent different ones with different properties and use that higher up. Like p2panda is a great way to leverage lipmaalink for building efficient replication of documents. What I was mainly trying to achive here was a very simple protocol for social media data where multi-device and edit/delete is a core part of the protocol. And I do think all of these p2p projects can learn from each other and influence each other. It's not a winner takes all.

gpicron commented 1 year ago

This then becomes very similar to meta feeds, and there is a lot of complexity in such a model. We did model some of this key seperation including network identity and fusion identity.

Yes it is, but I think without being strictly modelled as a tree with link to leaf. But instead of being a message on a meta feed you must replicate, this is a message that is the first message of a feed. There is potentially a link to some creator signing key, and creator may inform of its existence in another feed, but there is no direct coupling.

It also generalize the replication algorithm and implies for 2 peers to sync there knowledge of DAG's rooted at some message both for feed, thread, crdts, etc. And it makes creating Feed cheap. You can have several Feed with the same writer key. You can rotate the writer key/algorithm.

Also I very much agree on the don't reinvent the whole thing. There is value in that, but that is a much larger project and I'm not sure if we have the resources for that. At least I don't.

This is a bit the chicken and the egg. Having our own standards make use isolated and difficult to find resources. Let's take an example. If we decide that in the principle we disconnected the network identity from feed author on the device.
What it mean in term of changes in the current stack for a first version ? Adding an optional RPC to be called by the client telling the server the feed key. That remain backward compatible.
Added value: not so much, a bit of improvement on the confidentiality and traceability, but not much. What it allows: we can decide to support in parallel to SHS/BoxStream libp2p connectivity without breaking. It allows to add with some of the side services of that framework (discovery, DHT, NAT traversals), etc. Lot of useful features that we don't have specify and implement. This make SSB better known by the libp2p community. We can think at implementing EBT principles as a alternative PubSub implementation https://github.com/libp2p/specs/tree/master/pubsub. This potentially open the door for collaboration and financing from a wider community.