tent / tent.io

The website for Tent — the protocol for evented data storage and decentralized communication
https://tent.io
Other
794 stars 97 forks source link

0.4 Schemas #212

Open danielsiders opened 10 years ago

danielsiders commented 10 years ago

Tent 0.4 will feature machine-readable schemas. This will allow greater portability and interoperability of data and posts between applications and a long term data upgrade path.

Basics

Every post type will require a schema to be published and maintained by its creator. Post types are identified by the URI where the corresponding schema is published. Schemas can be published anywhere. Schemas are mandatory for all post types.

Schemas are immutable.

Tent schemas build on the ideas presented in JSON schemas. Schemas specify metadata and validation rules for Tent posts. Schemas will use the standard JSON data types with additional metadata including an optional semantic URI and validations for Tent links.

Schemas will also include validation for attachments.

Semantic URIs

Individual fields within a schema can be tagged with a semantic URI. Post type creators show that the same type of information is being used in many posts, for example the first name of a person, human eye color, or geographic coordinates on Mars. Combined with post body search this allows apps and users to search all posts for references to a particular type of data, e.g. all references to blue human eyes.

Links

Examples

Schema:

{
"tentLink": {
"notify": true
}
}

Link Object:

{
"entity": "https://jonathan.cupcake.is"
}

Schema:

{
"tentLink": {
"notify": true,
"minSpecificity": "post",
"maxSpecificity": "version"
}
}

Link Object:

{
"entity": "https://jonathan.cupcake.is",
"post": "asdf-1234",
"version": "sha512t256-e72efc70a20c1a9c177309be9b83dbc9faf782f0758f54a2d875e0b22d828d45"
}

Attachments

Schemas will include the following information about attachments:

Servers will be required to enforce schemas on incoming posts based on schemas. This means when creating or receiving posts of a new type, the server will first need to access the appropriate schema. Servers will permacache any schemas they encounter to limit traffic to schema servers and minimize the consequences of a schema being unavailable. Some server implementations/distributions will also ship preloaded with a collection of schemas.

In order to combat link rot and provide high availability there will be a directory at Tent.io which will track all known schemas to provide a backup of each in case the normal server cannot be reached. Post type creators may also choose to use this directory as the primary URI for their schemas.

poweruser82 commented 10 years ago

Schemas could be regular Tent posts so they could be also stored in a decentralized fashion. The URI has precedence, but if necessary, servers can ask another server (likely the one who sent the unknown post type) a copy of a schema. Also apps can send the schemas they use to server.

danielsiders commented 10 years ago

@poweruser82 I'm not sure how Tent posts are more or less decentralized than web-accessible documents. At the very least you get into a source of truth problem if multiple versions of a schema are being passed around (one server implementation with a parsing bug and you've got a malformed schema running around). We may approach the p2p circulation issue at a later date, but it's definitely premature right now. Let's get schemas working first. We'd definitely like to see a more decentralized solution long term but it isn't a vital issue now. In the meantime at least between Cupcake and Skate we're exposed to a huge portion of the overall Tent traffic, that reporting combined with preloading common post type schemas in self-hosting distros and users and developers submitting schema links to tent.io should solve the availability problem until 1.0.

poweruser82 commented 10 years ago

Websites could become inaccessible. Even tent.io's directory could be blocked by someone for a reason. I agree it's not a big problem now. Post type volatility of early times can be managed by directory and in-server cache. However considering schemas are going to be saved on servers, I can't see why don't use a regular post.

danielsiders commented 10 years ago

@poweruser82 It's something we discussed extensively over the past few months but it creates significantly more problems than it solves, especially in the short term. We'll continue to explore it moving forward.

quentez commented 10 years ago

Will the schema define whether a post is a singleton or not?

cuibonobo commented 10 years ago

There's something that concerns me about the immutability of schemas: how do you test new post types? I wonder if there could be a flag or something that could be set to say "please don't remember this schema, I'm still in a testing phase".

I suppose this could be solved by having a development server to test against, but it seems like setting up a Tent server is non-trivial at the moment and I don't really want to mess with it.

titanous commented 10 years ago

@quentez Maybe.

danielsiders commented 10 years ago

@jenmontes Schemas are immutable but posts can be deleted. There's nothing stopping you from creating a million posts in a test format and deleting them later. In terms of servers remembering them it's just a matter of creating a new schema or version of your existing schema and publishing all new content with that one. You really don't want to mess around with the immutability of schemas if you don't have to. They're identified by URIs, so there's an unlimited namespace. It's like creating one webpage at danielsiders.com/test and later publishing the real version at danielsiders.com/therealthing. If you're nervous, include the word "testing" in all your test schemas. In other words, there's zero cost to your server remembering all the schemas ever.

cuibonobo commented 10 years ago

I had considered the possibility of just using a testing name and that's fine, but in that case the server needs to give feedback about which post type names are taken. I also think that a server that encounters a new schema should somehow propagate that schema to all other servers that it knows about. Otherwise we might run into the possibility of uploading posts with my shiny new schema to Server A, but Server B may reject those posts because it has a different schema with the same name.

danielsiders commented 10 years ago

@jenmontes The "name" of a schema is it's URI, not it's "title" e.g. https://tent.io/types/status/v0# not status, so there's not really a risk of conflict. The global namespace is URIs, not arbitrary names. The only way they can collide is if you post a schema at a URI, delete it, and post another one at the same address. Don't do that. Also for folks who don't trust themselves, we'll also host post schemas at tent.io and promise not to pollute URIs.

We may eventually add server-server gossip about post types but early on that's more likely to cause cascading corruption than anything else, but it might be in 1.0.

redaktor commented 10 years ago

just wanted to add a nice read on Kris Zyp's JSON Schema and JSON Hyper-schema for newbies: http://brandur.org/elegant-apis

official specifications: http://json-schema.org