waku-org / research

Waku Protocol Research
MIT License
3 stars 0 forks source link

How should a client resolve inconsistencies if querying two servers? #40

Open s-tikhomirov opened 8 months ago

s-tikhomirov commented 8 months ago

Let's say a client sends the same request for two servers, and they respond with divergent responses. This is not unexpected, as we don't have consensus over messages. How does the client resolve this conflict now, and how should it happen ideally?

A concrete example I'm thinking about is when a client wants to earn extra rewards and inserts "fake" messages into its response (even with RLN, this is possible to some extent, see #38). This may also happen by accident, with one server genuinely having missed a message in the past.

alrevuelta commented 8 months ago

Being s1 messages returned by node 1 and s2 the ones by node2, the best we can do right now is the union of s1 U s2. Even if any of the nodes create messages out of thin air, this is rate limited by RLN. But of course, we need something better.

Having no "supra consensus" makes this difficult, so another way could be to increase the amount of nodes you query (eg 1..10) and filter out messages. For example if a message is reported by at least 2/3 of the nodes (=6) then you accept it as valid. Assuming that the nodes you have picked are random, you filter out possible malicious nodes. This generates some kind of consensus, but local to each node, since in theory different nodes can reach different consensus, since they won't have the same view.

Note that this applies for "store syncronization" and not really for https://github.com/waku-org/research/issues/21, since it would be expensive to query that many nodes. You may ask, but how do I know who is making the request:

Well, still unanswered but in the full node case there should be a quid pro quo. You help me to syncronize, and I help you.

Related to message reliability: @ABresting

s-tikhomirov commented 8 months ago

is it a full node trying to sync? which should be free?

Would it be acceptable if full nodes also pay for syncing like light nodes do? I'm concerned that we'd introduce some game-theoretic inconsistency if light nodes pay but full nodes don't pay (for the same service). For example, light nodes would be motivated to pretend to be full nodes somehow.

mart1n-xyz commented 8 months ago

You can have the server earning free sync by relaying to light nodes (how to monitor this is another question) so that it covers the cost of syncing if throughput is sufficiently high. However, the same issues as the one above apply imo.

If we have the server pay to sync, this cost would translate to the price per message for a light node. That is not necessarily bad but favours the centralization of servers (sync is a fixed cost).

alrevuelta commented 8 months ago

Would it be acceptable if full nodes also pay for syncing like light nodes do? I'm concerned that we'd introduce some game-theoretic inconsistency if light nodes pay but full nodes don't pay (for the same service). For example, light nodes would be motivated to pretend to be full nodes somehow.

Indeed could be a problem. But not sure how practical it would be that full nodes have to pay for synchronization. I mean, you are a store node, provide service to the network, and yet you have to pay? Doesn't seem fair. But ofc, we need to fix the possible "impersonation" (light pretending being full to get the service for free)

alrevuelta commented 8 months ago

cc @ABresting

s-tikhomirov commented 8 months ago

you are a store node, provide service to the network, and yet you have to pay?

If you, in turn, get paid for the service you provide, then it could be reasonable. The question is whether we want to introduce market-driven dynamics to full node syncing.

alrevuelta commented 8 months ago

Having no "supra consensus" makes this difficult, so another way could be to increase the amount of nodes you query (eg 1..10) and filter out messages. For example if a message is reported by at least 2/3 of the nodes (=6) then you accept it as valid. Assuming that the nodes you have picked are random, you filter out possible malicious nodes. This generates some kind of consensus, but local to each node, since in theory different nodes can reach different consensus, since they won't have the same view.

Self note. Don't think this makes sense. Every message that contains a valid rln proof should be accepted. If the store node generates "fake" rln messages (valid rln proof but never sent to the network), well let be it. Note also that this can be solved in upper layers (eg, if an rln membership costs money, why would someone do that?).

So inconsistencies should be solved with union of s1 U s2 U sn, filtering messages containing valid rln proofs.

ABresting commented 8 months ago

idk how relevant this might be but, figuring out the light node vs a full node in case there is payment for light nodes but not for full node opens 3 different questions:

RLN messages are verified messages

in case the sync of RLN enabled messages, simpler to s1 U s2. First get the #messages from max(s1,s2), for the remaining, send across the hashes of the messages and only get the missing ones in one round trip.

ABresting commented 8 months ago

regarding how to distinguish b/w light node vs full node assuming light nodes have to pay, how about some sort of erasure code that showcases the node is indeed a full node using the PoS (proof of storage) just like used in Filecoin as a part of Proof of Replication (PoRep) and Proof of Space-Time (PoST) mechanisms?

certain policies can be devised such as if very recent messages i.e. let's say < 5 mins, then free of cost, but in case older than a threshold then generate a proof that you are indeed a fullnode to get the messages for free.

s-tikhomirov commented 8 months ago

Every message that contains a valid rln proof should be accepted. If the store node generates "fake" rln messages (valid rln proof but never sent to the network), well let be it.

One concern here could be: what if the higher-level application depends on the "common knowledge" assumption? That is, if a node receives a message, not only it knows its content, but it also knows that other nodes know it, and so on. We should make it clear that applications should not make this assumption.

should all messages be priced the same? a 20KB msg vs 1 MB (or max size Waku msg)

I think it would be better to pay per byte, not per message. Otherwise, nodes have incentives to unnecessarily bundle small messages into larger ones, or generally not be frugal w.r.t. how much data they send.

if very recent messages i.e. let's say < 5 mins, then free of cost, but in case older than a threshold then generate a proof that you are indeed a fullnode to get the messages for free.

It's interesting to think of a definition of a full node here. How many gaps in the history is a node allowed to have to be considered "full, just not fully synced"? Or, if the criterion for "fullness" is "running Relay", then can't a light node mount Relay to pass this test without actually relaying anything?..