log verification - Githubissues

ahdinosaur commented 1 year ago

forking from https://github.com/arj03/minibutt-spec/issues/10#issuecomment-1455837865:

just a side note, if we add checkpoints, i wonder if there's a way to use checkpoints as certificates to re-gain log verification, a la bamboo log verification using limpaalinks. since as i understand so far, ssb2 gives up strict log verification in exchange for partial replication. but can we have both? or at least where log verification is optional but doesn't require complete replication.

a simple idea would be having multiple cycle periods in the checkpoints. in the shortest cycle, each checkpoint points to the previous checkpoint. in the next cycle (with a longer period), every 10 checkpoints is a deca-checkpoint, and every checkpoint points to both the previous checkpoint and their previous deca-checkpoint. and every 100 checkpoints is a hecto-checkpoint, and every checkpoint points to all of the previous checkpoint and the previous deca-checkpoint and the previous hecto-checkpoint, and so on...

with some thought, there's probably a not-so-complicated way to encode a structure like limpaalinks in the checkpoints, where there's a shortest path from every checkpoint to the root message, where that shortest path distance grows logarithmically with the number of messages.

anyways, just some musings, cheers. :purple_heart:

ahdinosaur commented 1 year ago

with some thought, there's probably a not-so-complicated way to encode a structure like limpaalinks in the checkpoints, where there's a shortest path from every checkpoint to the root message, where that shortest path distance grows logarithmically with the number of messages.

oh right, i might understand how now. we could use limpaalinks basically as described here

limpaalinks-graph

but instead of each message having a limpaalink, only checkpoints have a limpaalink. so instead of the 26th message pointing to the 13th message, it's the 26th checkpoint pointing to the 13th checkpoint.

ahdinosaur commented 1 year ago

also turns out Aljoscha wrote a proof about this: https://aljoscha-meyer.de/linkingschemes

staltz commented 1 year ago

We could do this, but what would it gain us and would it cost us? The gain (log verification) seems related to ensuring that you didn't miss a message or that the replicating peer didn't intentionally omit some message, which is good, and I think we will preserve this property anyway. So it's more about the performance of doing so.

On performance, I don't think sig-chain verification is a bottleneck in the current SSB. Attempting decryption of encrypted messages is heavier. Creating indexes is also pretty heavy. And if we have a much smaller dataset, then performance is even less of a concern. I'm aiming for apps to use at most 100MB (by default, this would be configurable) which means that verifying feeds is not going to be performance problem. You can always make it faster, sure, but that comes at a cost.

As far I see, the cost for lipmaalinks is a more complex ssb-db2 implementation. arj and I looked at it once and it would be a headache to implement. It's much easier and simpler to keep track of the previous, and that's it.

staltz commented 1 year ago

As far I see, the cost for lipmaalinks is a more complex ssb-db2 implementation. arj and I looked at it once and it would be a headache to implement.

Clarifying this, the difficulty wouldn't be in the database layers of ssb-db2, it would mostly happen on the top-most API level, where we take a new message and in order to append it to the log we have to fetch one or several older messages way back in the feed. This is more complex than peeking at the latest message for that feed ID.

Be it SQLite or ssb-db2, you need a lookup of feed IDs to their latest msgs. That could be a SQLite table or a (as it is right now) a ssb-db2 leveldb. But adding lipmaalinks means you can't rely on that table alone, you need to hop around in the feed.

arj03 commented 1 year ago

I think limpaalinks are great and you can build really nice stuff on top of that. But if we were to go that route why not just use p2panda? It already implements bamboo and is written in rust :>

ahdinosaur commented 1 year ago

The gain (log verification) seems related to ensuring that you didn't miss a message or that the replicating peer didn't intentionally omit some message, which is good, and I think we will preserve this property anyway.

this makes sense for receiving a sequence of message, but if you ask for a slice of messages (during sliced replication), how do you know the first message is a verified message in the log?

that's why i was thinking about only using log verification on the checkpoint messages, since they'd be the start of a slice of messages. so we could get log verification and partial replication.

Clarifying this, the difficulty wouldn't be in the database layers of ssb-db2, it would mostly happen on the top-most API level, where we take a new message and in order to append it to the log we have to fetch one or several older messages way back in the feed. This is more complex than peeking at the latest message for that feed ID.

yeah, if this is too much work it's not worth doing, especially when there's enough problems to solve. the benefit of only doing this for checkpoint messages is that we don't need to do this all the time. however we would still need to store something in the database to keep track of possible limpaalinks that a new checkpoint might reference. also more things to maintain, more possible sources of bugs. :lady_beetle:

staltz commented 1 year ago

this makes sense for receiving a sequence of message, but if you ask for a slice of messages (during sliced replication), how do you know the first message is a verified message in the log?

Interesting question, the message is signed so at least you know the author meant to have it, but why do we need to connect it to the past?

gpicron commented 1 year ago

Because it offers a provable causality and guaranty eventual consistency. If you relax these feature, you've got Nostr

staltz commented 1 year ago

Nostr has no previous linking, so it suffers from omitted-record attacks, which is not the case with minibutt because we have previous linking.

ahdinosaur commented 1 year ago

Interesting question, the message is signed so at least you know the author meant to have it, but why do we need to connect it to the past?

ssb1 provides a security guarantee that even if someone else has your private key, it's nearly impossible for them to impersonate your previous log messages with a fake feed, they can only add new valid messages.

this is because when someone replicates your log, they start from the first message and add each new message, with verification. so ssb1 has "total log verification", every new message is verified to come from the original log. for the adversary to impersonate your previous log messages, they'd need to impersonate a new log from scratch, which would only work on those who have never replicated your feed. anyone who has ever replicated your feed would ignore the new invalid messages, causing a network partition for that feed.

also as noted in ssb-buttwoo-spec, we could even improve on ssb1 log verification if "the friend graph [includes] the root hash besides the feed, otherwise an adversary (given the private key) could still create a fake feed." so in this improved design, when you learn about a new peer, you'd learn their root hash, so on initial replication you'd ignore any other root hash and avoid any impersonated logs.

for ssb2 as described so far, it should be much easier for an adversary (given the private key) to impersonate your previous log messages, because with ssb2 sliced replication there's no guarantee that a slice is verified back to the first message in the feed (the root hash), we are only verifying against previous messages. so an adversary should be able to impersonate a slice that doesn't start from a previous message that peers already have.

now whether the "total log verification" security guarantee is worth the cost, that's up for discussion.

gpicron commented 1 year ago

Nostr has no previous linking, so it suffers from omitted-record attacks, which is not the case with minibutt because we have previous linking.

Sorry I misunderstood. I thought that you were suggesting to drop completely the previous. If I understand well, you are suggesting to not verify to not verify the first message of a slice.

Somewhat it is equivalent to say that every message is a "anchor". This is not really a problem in the context of a social media (reason why Nostr works) and we yet do when we gather messages related a thread from feed we don't replicate

Note: the concept of slice replication itself implies linear chain of messages like in ssb1 (no fork).

So I come back with the principle of the anchor, relation backlinks and tangle replications over relation. Because I think it is the good tradeoff.

the whole graph of anchors only is hardly 1-2 KB per year. It is a tangle. Storing and replicating it is cheap. Even with thousands of accounts.
the anchor are roughly equivalent to lipmalinks. Except that we make the rules about frequency of emission more flexible and context sensitive.
the replication of feeds using previous backlinks up to last anchors means looking at the anchor graph, selecting the last n anchors and asking peer all they have via ´previous' backlinks up to those anchors. Again a rooted DAG replication.
the author field of today can be replaced by a backlink to the anchor it is part of.
the verification via ´previous' backlinks then ´previous-feed' and then parent-feed' make it possible to check the chain up to root
key rotations and off disk master key only used one or two time before rotation are today the best strategy to limit the risk of key stealing and impersonation.
thank to the flexibly of named backlink, you can imagine to increase some cases. Imagine a thread of posts. Today you have a backlink to the root post, maybe to some post in the thread, and one to the previous message of the author. Today verifying the feed of an unknown author and displaying a profile implies replication of the full feed. With the backlink concept, the author is replaced by a backlink to an anchor. If we add one query that permit a peer to ask for the anchor graph given 1 anchor link, in one step the full graph and start replicating it using the common dag replication EBTx and it can start replicating some parts of it to display profile for instance .

That very similar to some of some concept of metafeed but in practice you only one dag replication EBT like algorithm and one simple query to discover anchor graphs from a single anchor link .

arj03 commented 1 year ago

whether the "total log verification" security guarantee is worth the cost

Well put @ahdinosaur. I'm sorry for my earlier comment, I see now that I could have been received negatively. But it was exactly this point that I was trying to address. Do all the messages back have to point to the same root? You can see some of the same tension in various crypto projects, where the only way to truly make sure the chain is correct is to know the start and verify everything on your own machine. I would like us to end up in a situation where we use the social fabric also to know what is valid or not. Even if we include the root hash of the first message (or some linking scheme), the key can still be compromised. The current approach makes sliding windows easier. As long as there is consistency within our current view things are fine.

staltz commented 1 year ago

ssb1 provides a security guarantee that even if someone else has your private key, it's nearly impossible for them to impersonate your previous log messages with a fake feed, they can only add new valid messages.

(...)

now whether the "total log verification" security guarantee is worth the cost, that's up for discussion.

Now I see what you mean, thanks for the clarification! To be honest I haven't thought about creating resilience post-key-compromise, I've been treating key compromising as game over. It is tricky, and will get more tricky once your keypair can be put on several devices. We need to find ways of mitigating that problem.

That said, it's an identified problem but doesn't rank too high in my list of problems. We have to keep track of it, but give priority of mental space to other problems, which are more existential. All problems are problematic but some problems deserve special attention and focus. We're focused on (storage/memory/bandwidth/patience) sustainability, irrecoverably forked feeds, and a few easy low-hanging fruits such as same-as.

gpicron commented 1 year ago

@staltz @arj03 relying on same keypair for several feeds is just increase the risk and the consequences.

AljoschaMeyer commented 1 year ago

but instead of each message having a limpaalink, only checkpoints have a limpaalink.

To give you a precise and simple proposal of how this can be done:

Let f be the function to compute normal skiplinks. Choose some number k, then the following function f_k computes skiplinks such that you have logarithmic paths between linked lists of size k (set k = 1 to obtain the normal case):

function f_k(n: NaturalNumber) {
    if n % k == 0 {
        return f(n / k) * k
    } else {
        return n - 1
    }
}

This way, only every k-th message (slightly fewer actually) have a skiplink different from their backlink. In exchange, the certificate pools of most messages (all but those divisible by k) grow by k messages. So whether this is worth it depends on whether you expext peers to store many consecutive messages. If you expect them to cherry pick lots of individual messages, k should be small. If you expect them to store a few, large slices of a log, k can be large. For precise calculations, do a precise calculation =P

staltz commented 1 year ago

Unrelated to recent comments, reviving an old comment of mine

Nostr has no previous linking, so it suffers from omitted-record attacks, which is not the case with minibutt because we have previous linking.

I was thinking in the shower about this, and if we don't have sequence field, then I think omissions are possible, even with the existence of a previous field. Say e.g. the following example:

graph RL
C-->B-->A
D-->A

Say the remote peer has all of this, and you have nothing. They then send you A,B,C and that's it, you think you have all the messages, but you're actually missing D and you have no idea about that.

gpicron commented 1 year ago

Say the remote peer has all of this, and you have nothing. They then send you A,B,C and that's it, you think you have all the messages, but you're actually missing D and you have no idea about that.

This is the reason of the gossip and p2p. It is sufficient to have one honest peer knowing D to guaranty that a sybil attack that would try to force this would eventually be discovered and diffused. Else only the sybil attacker "think" that D does not exist.

AljoschaMeyer commented 1 year ago

It doesn't even have to be malicious. D might only be published tomorrow, so you cannot fault anyone for not replicating it today. You cannot distinguish between malicious and non-malicious behavior in these cases. That's one of the reasons last for a clear definition of validation, because this scenario is one that validation cannot and hence should not cover.

ssbc / ssb2-discussion-forum

log verification #13