IPLD Type Convention - Githubissues

Stebalien commented 6 years ago

Motivation 1: I'd like to be able to look at an IPLD object and know, approximately, it's intended interpretation (without guessing or using context).

Motivation 2: I'd like to be able to define or extend a type system for my IPLD application without having it completely fail to interop with other IPLD type systems.

Motivation 3: I'd like to buy some time to figure out the perfect type system.

We've been discussing IPLD type systems but these discussions usually boil down to implementing the perfect system. I'd like to propose an alternative: the IPLD type convention.

Proposal: @type: $something denotes a type. What this type means depends on the type's type (if specified).

Why @? It's less likely to conflict but I'm not fixated on it.

Why "the IPLD type convention"? This isn't a specification. Basically, I'm giving in to JSON duck-typing and calling it "good enough".

Why is it good enough? This is a decentralized system so we'll have to check everything anyways. Trying to prescribe structure on users tends to lead to more trouble than it's worth (IMO). If we need more structure, we can always give the type a type to make sure we're operating within the correct type system.

How will this work with existing formats:

CBOR/JSON: Do nothing. For existing objects without a @type, these objects simply don't have types (within this system). Type systems that need to give everything some type can just give these some
Git (tree, commit, etc), Eth, etc: I'd like to retroactively add in a type field. Thoughts? I kind of doubt this will break anything.

We've also talked about adding a new format with the structure <CidOf(type)><data>. That is, introduce a new format where we put all the type and schema information in a separate object, prepending the CID of this separate object to the actual object (the value).

After considering this for a bit, I've realized we should treat these as separate concerns: we're conflating types with schemas. There's no reason we can't introduce this new, compressed format at some later date even if we go with the above "type convention" proposal.

Disclaimer: this was not my idea, I've just finally convinced myself that it's probably "good enough".

Thoughts @jonnycrunch (you're the one who told me to look into the JSON-LD spec), @diasdavid, @davidad, @whyrusleeping?

While I'd like to avoid prescribing too much, I'd like to define a set of conventions that users should follow. For example:

@type: CID - CID points to the actual type.
@type: {}: inline type. This will often be used for type "constructors". For example: {@type: {@type: "generic", constructor: CidOfConstructor, parameters: [...]}.
@type: "string": A human readable string/path. IMO, this should usually be used to specify the type system.
@type: 1234: A multicodec. A reasonable type-of function would look this multicodec up in the multicodec table to map it to a CID.
@type: [thing1, thing2, thing3]: multiple types.

jonnycrunch commented 5 years ago

@Stebalien Seems that you are most concerned with simple General Purpose data types definitions to start.

"simpleTypes": {
      "enum": [
        "array",
        "boolean",
        "integer",
        "null",
        "number",
        "object",
        "string"
      ]
    },

You could then build upon this and add support for more complex data types to give more meaning and context. This would help with validation of the structure.

An simple example extension would be date and datetime, which are an extension of { "@type" : "string"} but the context would define the syntax of the string.

Presently, the handling of datetime in @context is xsd:datetime which references http://www.w3.org/2001/XMLSchema#, where the explaination as documentation source is in html.

My favorite annotation in this xml is:

First the built-in primitive datatypes. These definitions are for information only, the real built-in definitions are magic.

Example for string from XMLSchema:

<xs:simpleType name="string" id="string">
<xs:annotation>
<xs:appinfo>
    <hfp:hasFacet name="length"/>
    <hfp:hasFacet name="minLength"/>
    <hfp:hasFacet name="maxLength"/>
    <hfp:hasFacet name="pattern"/>
    <hfp:hasFacet name="enumeration"/>
    <hfp:hasFacet name="whiteSpace"/>
    <hfp:hasProperty name="ordered" value="false"/>
    <hfp:hasProperty name="bounded" value="false"/>
    <hfp:hasProperty name="cardinality" value="countably infinite"/>
    <hfp:hasProperty name="numeric" value="false"/>
</xs:appinfo>
<xs:documentation source="http://www.w3.org/TR/xmlschema-2/#string"/>
</xs:annotation>

And more specfically for datetime:

<xs:simpleType name="dateTime" id="dateTime">
  <xs:annotation>
  <xs:appinfo>
    <hfp:hasFacet name="pattern"/>
    <hfp:hasFacet name="enumeration"/>
    <hfp:hasFacet name="whiteSpace"/>
    <hfp:hasFacet name="maxInclusive"/>
    <hfp:hasFacet name="maxExclusive"/>
    <hfp:hasFacet name="minInclusive"/>
    <hfp:hasFacet name="minExclusive"/>
    <hfp:hasProperty name="ordered" value="partial"/>
    <hfp:hasProperty name="bounded" value="false"/>
    <hfp:hasProperty name="cardinality" value="countably infinite"/>
    <hfp:hasProperty name="numeric" value="false"/>
  </xs:appinfo>
<xs:documentation source="http://www.w3.org/TR/xmlschema-2/#dateTime"/>
</xs:annotation>
<xs:restriction base="xs:anySimpleType">
<xs:whiteSpace value="collapse" fixed="true" id="dateTime.whiteSpace"/>
</xs:restriction>
</xs:simpleType>

More complex data types would be generators and would help with the IPLD selectors issue.

As far as JSON-LD, I'm rolling back my support of it. There is SO much reliance on location-based mappings. I have really started to look into how to strip out all reliance of the location and make it a pure content-addressed schema. But I'm getting a "symbol grounding problem". Also, there is so much self-referencing that my script fails given the cycles in the graph. I have moved away from the w3c model and looking at wikidata. Unfortunately, there isn't a good mapping for simple data types like datetime above. I think your "good enough" approach expresses the fundamental "intentionality" that point the user in the direction of the proper meaning. Wikidata's approach is to give many and and allow users to "triangulate" the meaning, especially across languages.

I, myself, like inline link <cid>:

{
    ...
  @type : {  "/" : "<cid>" }
}

The use of @type to denote the object is an instance of a class of entities.

The problem is what do you link to? You'd be building a whole ontology for data types.

BTW, there is an ISO standard for General Purpose Datatypes.

If you keep it simple and start with strings, and those strings are defined in the @context.

{
  @context : {  "/" : "<cid>" }, 
    ...
  @type : "string"
}

More complicated examples can build out from here.

In theory, this syntx for links should be compatible with json-ld, but in practice it not. see my issue #110 in JSON-ld.

pavetok commented 5 years ago

I'd like to be able to look at an IPLD object and know, approximately, it's intended interpretation (without guessing or using context).

Why does it necessary to embed type information into data itself? Modern CTT, for example, says that typing judgments are completely separate things. As consequence one value can inhabit multiple types.

ipld / specs

IPLD Type Convention #71