perama-v commented 1 year ago

Problem

If the GAMB contract is deployed on mainnet, then it could at times be prohibitively expensive to maintain a publishing cadence.

For example, at 21 Gwei/gas a publish transaction costs $1. If gas price is 100 Gwei/gas, this would be $5.

This could reduce participation, and at a minimum reduce the frequency that a publisher chooses to broadcast.

Solution

One-time "I'm a publisher for this topic" transactions.

Publisher

A publisher could use IPNS as the pointer for their published manifests.

This would entail a one-time Ethereum mainnet transaction to the Broadcaster contract containing two elements:

IPNS ID (e.g., k52...abc)
Topic ID (e.g., my-special-database)

Then generate the manifest (which contains immutable CIDs for the database) and compute its manifest_CID.

Then publish the manifest CID (ipfs name publish mainfest_CID).

A few hours later they may have a new discrete piece to add to the database:

Generate the piece, including its CID
Update the manifest, get a new manifest_CID
ipfs name publish mainfest_CID

No on chain transaction necessary!

If the publisher loses their IPNS key they can submit a new transaction registering themselves as a new publisher.

Consumer

Wants up to date manifests for topic: my-special-database. Consults the Broadcaster contract and 3 publisher IPNS IDs are returned:

> k52...abc
> k51...123
> k54...456

Looks up each one using IPFS to get the manifest from each publisher. Makes an informed decision about which is most up to date using the info within the manifest (e.g., pick the one with most pieces or the highest block number).

Uses the CIDs from the favourite manifest to get the data.

Unresolved

IPNS ID derivation

Can a single IPFS node maintain different IPNS keys?

E.g., a single node might want to publish different topics, and could perhaps create a new key (and hence IPNS name) for each.

perama-v commented 1 year ago

The key here is that old, unchanged data remains unaltered. CIDs are unaffected, and multiple publishers publishing manifests under different IPNS ids are all referencing the same CIDs. Thus, each publisher amplifies the data and pushing regular updates doesn't degrade the system.

perama-v commented 1 year ago

Regarding IPNS ID derivation: It looks like a publisher could use different keys for different topics as follows:

Generate

key gen --type=rsa --size=2048 topic-a
key gen --type=rsa --size=2048 topic-b

Publish

ipfs name publish --key=topic-a /ipfs/manifest_hash_a
ipfs name publish --key=topic-b /ipfs/manifest_hash_b

perama-v commented 1 year ago

Note: If anyone can submit a Tx to associated a (IPNS, topic) pair, what is to stop someone from associating IPNS names they do not own?

E.g.

Alice: Tx (alice_ipns, "database_one")
Mallory: Tx (alice_ipns ❌, "database_two") (should be mallory_ipns)
Charlie: Tx (charlie_ipns, "database_two")
Dave: a. Query("database_two") -> response: alice_ipns, charlie_ipns b. IPNS lookup of alice_ipns -> content: manifest for "database_one" ❌ (was expecting "database_two") c. IPNS lookup of alice_ipns -> content: manifest for "database_two"

Mallory can trick Dave into looking up an IPNS that doesn't contain the data. Here they spoofed Alice, but they could have also put any non-existent IPNS.

This is easy for Dave to resolve: After retrieving each manifest, he (his software) can check that it is for the correct database. In this case, the manifest found at alice_ipns contains "database_one" and is rejected. He can also choose to remember that charlie_ipns contains the correct manifest and preference that in the future.

Conclusion: It is easy for honest users to filter out spam publishers at no/low cost (a single file download and automated check). Moreover, the attacker has no material gain and must pay for a transaction for every (IPNS, topic) pair association.

tjayrush commented 1 year ago

Short (uninvited, I admit) comment here...

In the original design of the Unchained Index we purposefully imposed the cost of publication on the publisher as a "promise of correctness." It's an anti-spam measure.

If a publisher only need pay to publish once by pointing to an IPNS to the smart contract and then can later modify the underlying data, this seems to me to defeat the purpose.

One could just as easily publish the new hashes to a website.

By allowing the IPNS pointer to change at no cost, you've removed the only small measure we imposed to extract a cost from the publisher.

perama-v commented 1 year ago

Ok thanks for your perspective 🙂

perama-v commented 1 year ago

I agree that the publishing mechanism is not spam resistant.

So to summarise, it offers

censorship resistance
content discovery mechanism
low cost (one time tx, zero ongoing)
no promise of the correctness of content. A user faced with a deluge of publishers claiming to have wonderful index data would have to either inspect the data of each publisher and make an assessment, or find an out of band way to make such an assessment. Assessment depends on the data kind, but for example the unchained index of address appearances could be compared against a node by sampling different addresses and checking if the blocks contains the address.

perama-v commented 1 year ago

An example for a different data kind: 4byte mappings. A user can inspect the mappings and hash them to easily verify correctness. The fact that ongoing publishing is free might encourage more publishers to frequently point their IPNS to their most recently updated data.

A spam publisher can be weeded out by checking the mappings and if not correct, ignoring the publisher from then onwards

tjayrush commented 1 year ago

A user can inspect the mappings and hash them to easily verify correctness.

That's right. I remember thinking about this, but I always forget to put it into words. But this is super important.

The ability for the end user to take the function that is returned when he/she queries for a four-byte is a perfect example of "it's the responsibility of the user to prove it for themselves." We give them enough information so that they can prove it themselves, and they are responsible for doing so if they wish. Anti-spam is in their hands, and the smart contract doesn't care. (It's like a honey badger in that sense.)

This is great. "Reproducibility" is important. In the case of fourbytes, it's super simple. In the case of the address index, it's possible, even if it takes a lot more work. But in both cases, it's possible.

perama-v / GAMB

Ongoing cost to publish #1

Problem

Solution

Publisher

Consumer

Unresolved

IPNS ID derivation