waku-org / research

Waku Protocol Research
MIT License
3 stars 0 forks source link

Cost function of message retrieval #35

Open s-tikhomirov opened 8 months ago

s-tikhomirov commented 8 months ago

What should determine the cost of retrieving a message?

PoC approaches

The server may estimate the cost as:

More elaborate cost calculations

The costs of a Store server are storage, bandwidth, and computation. A Store server does two things: it stores messages, and serves messages to clients.

The cost of storing messages is composed of:

The cost of serving messages to clients, per unit time, is composed of:

Storage is likely the dominating cost. Storage costs is proportional to the amount of information stored and the time it is stored for. A cumulative cost of storing a single message grows linearly with time. The number of messages in a response may be approximated by the length of the time frame requested.

Computation: the server spends computing cycles while handling requests. This costs likely depends not only on the computation itself, but also at the database structure. For example, retrieving old or rarely requested messages from the local database may be more expensive than fresh or popular ones due to caching.

More formal calculations should be done, under certain assumptions about message flow (i.e., that it is constant).

fryorcraken commented 7 months ago

With the new store req-res that returns message ids, then it becomes trivial to have a cost per message. This assumes that client does not pay for message ids, but pays for message content:

sequenceDiagram
    client-->>server: request msgs ids for period t1 to t2
    server-->>client: returns 10 msg ids
    client->>client: check seen msgs
    client->>client: 3 msgs are missing
    client->>blockchain: pays server for 3 msgs
    client->>server: request 3 messages, attaches txid
    server->>blockchain: checks payment
    server-->>client: deliver messages

PS: sorry about the label deletion, this PR will remove this kind of automtaed action: https://github.com/waku-org/pm/pull/103

s-tikhomirov commented 7 months ago

client does not pay for message ids, but pays for message content

The scheme makes a ton of sense!

Do you think we should aim for it in the PoC, or in later protocol iterations? As of the current PoC outline, the client skips the request-ids stage and instead pays a fixed price per hour of history, and gets all relevant messages in response, no matter if the client already has them or not. This is kinda wasteful (client gets messages it already has) but conceptually simpler (just one round-trip).

jm-clius commented 7 months ago

Another thing that I like about this scheme is that the client can also use the list of returned hashes:

fryorcraken commented 7 months ago
  • to compare responses by different store nodes and choosing the one with the most up-to-date response

Absolutely, I think I wrote somewhere how Alice could then do a sync req to Bob and Carole. Compare what messages they have, query/pays Bob because he's cheapest to retrieve the bulk of messages, and then query/pays Carole of a unique message she has, but she is more expensive.

It may also make the market more interesting:

Alice will always use Bob for the most recent messages, but has to use Carole if she was offline for more than 12 hours.

What is interesting here, is that once the protocol is available, this is purely an engineering problem of window-shopping and cherry-picking.

client does not pay for message ids, but pays for message content

The scheme makes a ton of sense!

Do you think we should aim for it in the PoC, or in later protocol iterations? As of the current PoC outline, the client skips the request-ids stage and instead pays a fixed price per hour of history, and gets all relevant messages in response, no matter if the client already has them or not. This is kinda wasteful (client gets messages it already has) but conceptually simpler (just one round-trip).

I will let you and @jm-clius decide here. I think it may be worth considering having it from PoC, if store sync can be delivered before.

My current opinion is that when store sync is available, then we can assume the best practice will become: (A)

instead of (B)

My expectations might be incorrect tho. My assumption is that (A) gives you a better bandwidth/latency ratio.

store sync:

query with msg id:

Even if we always 2 rounds trips, the benefit over (B) should be visible as soon as (B) returns more than 1 page of results.

Note: when I write "store sync" I mean "store query to retrieve messages id". This may not be the correct/final name of the protocol.

s-tikhomirov commented 7 months ago

My current opinion is that when store sync is available, then we can assume the best practice will become: (A) store sync store query missing messages using id.

My tangential concern here is about the relationship between Store sync and incentivized Store requests. If both are available, why would a rational client use incentivized requests? It can "pretend" to be a syncing server and get the same data for free. I know there may be heuristics to distinguish such clients-pretending-to-be-servers, but I'm not sure we can do it with certainty.