onename profiles - Githubissues

jbenet commented 9 years ago

Let's put all onename profiles on IPFS.

Steps (afaik):

[ ] add ipfs as a storage backend to https://github.com/blockstack/blockstore
- [ ] swappable backends to https://github.com/blockstack/blockstore (may or may not be there already)
[ ] replicate the data

Would be useful to maintain a pre-computed index on all the data and keep a head of it at a well known place (bound to ons/onename, ipns, and dns). the index can be verified against any blockchain, but allows fast access. (i assume https://github.com/blockstack/blockstore already computes such an index, i mean making it accessible directly to any ipfs node too)

jbenet commented 9 years ago

cc @muneeb-ali @jcnelson

@jcnelson do you want me to track replication to https://github.com/jcnelson/syndicate here too? or want to track that separately? (asking first before assuming :) )

muneeb-ali commented 9 years ago

Blockstore already has a bunch of "storage drivers" (what you referred to as "backends"):

https://github.com/blockstack/blockstore-client/tree/master/blockstore_client/drivers

Vanilla Linux, DHT, S3 to be more precise. Although it'd be quick to port some "drivers" we wrote earlier for Syndicate to this. I believe this is where the IPFS driver can go. One concern I have is that since these are currently implemented in blockstore_client (and with good reason) there might be redundant work required down the road if someone comes up with a client in another language (which is already happening). Just something to keep in mind.

Replicating the data part should be pretty straight forward.

The pre-computed index of all the data (human-readable key, hash(data)) currently exists in a DB with blockstore. The merkle hash of this global state is announced in the blockchain with new operations.

For Syndicate, things are a little different because Syndicate follows the design of using "importers" for pulling in different types of data into Syndicate. @jcnelson can confirm, but my understanding is that for Syndicate instead of going the driver route, Jude will just implement an "importer" in Syndicate itself.

The driver model will hide use of IPFS from blockstore users. If someone wants to mount the blockchain ID namespace and the associated data directly via IPFS, what interface will IPFS provide?

jcnelson commented 9 years ago

There would only need to be a generic Syndicate driver for Blockstore. Syndicate already has Python bindings to make this possible.

Syndicate itself is designed to handle interfacing with back-end storage providers, intermediate CDNs, external datasets, data indexes, and application-defined storage logic (like deduplication, encryption, access logging, replica placement, etc.) on behalf of applications like Blockstore.

@jbenet Not sure what you're asking?

muneeb-ali commented 9 years ago

I think he is asking if it's OK to discuss Syndicate mirroring here vs. on the Syndicate github repo

jcnelson commented 9 years ago

Ah, okay. Let's keep this discussion in one place :)

jbenet commented 9 years ago

Sorry guys, ended up unable to visit and work on this last week. But some outlining of what we need to do:

Something that would help-- could you:

[ ] point to the pieces of code that abstract out storage in blockstack/blockstore? (file ideally)
[ ] describe the relevant data structures briefly? (the types, basically)
[ ] describe any relevant media that should be accessible as regular posix files? (e.g. images)
[ ] can you name any other entities you'd want to link to, but cannot embed meaningfully atm?

So to make ipfs-backed-blockstore we need to:

[ ] identify all the blockstore data to replicate
- the datastructures point above, this should be very straightforward.
[ ] implement ipfs-backed storage module.
- we have https://github.com/ipfs/python-ipfs-api (thanks @amstocker and the py-ipfs folks) so we can interface trivially.
[ ] setup a deployment in IPFS Community infrastructure to replicate all the onename data for you, running ipfs-backed-blockstore
[ ] stretch goal: set it up to produce an ipfs-head hash periodically (with every bitcoin block?)
- [ ] setup a DNS TXT record to bootstrap this
- [ ] show you how to sign it + publish it with ipns
- [ ] add onename resolution to ipns (below)

@judenelson: i suspect it would look similar for syndicate? o/ maybe drop an equivalent task list here?

Separately, to add onename resolution to /ipns/<name> paths in ipfs, what we'll need to do is:

[ ] select a light client
- without replicating the entire blockchain.
- ideally with the stretch goal above, it will just be straight ipfs
[ ] implement a name resolver in the ipns module of https://github.com/ipfs/go-ipfs/
[ ] resolve something like /ipns/jbenet.one (or whatever the non-dns-clashing tld you use is (offline as i write this))

jcnelson commented 9 years ago

point to the pieces of code that abstract out storage in blockstack/blockstore? (file ideally)

There are eight methods to implement: get/put/delete for immutable and mutable data, a one-off initialization method, and a method to generate a driver-interpreted URL to mutable data.

Example disk driver: https://github.com/blockstack/blockstore-client/blob/master/blockstore_client/drivers/disk.py.

describe the relevant data structures briefly? (the types, basically)

Immutable data, mutable data, and routes.

Immutable data is unchanging and content-addressed--the hash for an immutable datum is embedded in a user's profile directly, and the hash of the user's profile is embedded in the blockchain. Immutable data has very high authenticity and integrity guarantees (as strong as the blockchain), but at the cost of having to send a transaction each time the user puts or deletes an immutable data record.

Mutable data is URL-addressed, and is atomically signed and versioned by the writer. The URLs and writer public key are treated as a specially-crafted piece of immutable data (called a route), but the data the URLs refer to can be written and rewritten at line rate by the writer. Readers check and cache the version for each mutable data record to avoid stale data, and use the public key to verify the data and version's authenticity. While writes to mutable data are much faster, the downside is that a malicious network or storage provider can deny readers fresh data by hiding new writes; we hedge against this by giving the user the choice of storage providers, and replicating to a set of them.

An immutable datum is a binary string. Mutable data and routes are JSON documents that adhere to this schema (taken from https://github.com/blockstack/blockstore-client/blob/master/blockstore_client/storage.py)

# mutable storage route
ROUTE_SCHEMA = {

   "id": schemas.STRING,
   "urls": [ schemas.STRING ],
   schemas.OPTIONAL( "pubkey" ): schemas.STRING
}

# mutable data schema
MUTABLE_DATA_SCHEMA = {

   "id": schemas.STRING,
   "data": schemas.B64STRING,
   "ver": schemas.INTEGER,
   "sig": schemas.B64STRING
}

describe any relevant media that should be accessible as regular posix files? (e.g. images)

Not sure if "should" is the right word. Blockstore's client library and command-line tool already provide a JSON-RPC interface for getting, putting, and deleting mutable and immutable data.

If you wanted to abstract these records as files, my recommendation would be:

Treat an immutable datum as a read-only file with mode 0444.
Treat the data field in a mutable datum as a read/write file with mode 0644 (with the name given in id), provided that the ver and sig fields are regenerated on each write. Note that the filesystem client would need to atomically increment ver, and would need to somehow get the writer's private key to generate sig. Our client library for Blockstore already does this.

@judenelson: i suspect it would look similar for syndicate? o/ maybe drop an equivalent task list here?

The Syndicate driver would write the serialized JSON records as files under a given directory in the user's Syndicate volume (not too different from how the disk driver works).

ipfs / notes

onename profiles #57