Not sure what is the best way to communicate, so since the protocol labs MO seems to be to do everything in github, here is an issue.

Schema kinds

Why are these called kinds? In the context of type theory, kinds have a specific meaning, and calling these kinds seems confusing at least to me.

Here is the association I have when hearing about kinds in the context of a type system. https://en.wikipedia.org/wiki/Kind_(type_theory)

Schema representations

## Fizzlebop is a pair of fields which serializes as "value-of-a:value-of-b" as a string.
type Fizzlebop struct {
    a String
    b String
} representation stringjoin {
    join ":"
}

Unless stringjoin prescribes some kind of escaping scheme, the type of a and b is not String but the subtype of all strings that do not contain the ':' character, otherwise the representation is not unique and thus not reversible. So basically saying that a can be any string is a lie.

Is something like "strings that do not contain ':' expressible in ipld schema?
I find representations for a limited subset of a type somewhat frightening. The type definition is no longer that useful since you need to know the representation to know the limits. Also, the info that the strings may not contain : is only very implicitly available and thus not obvious.

@warpfork can you enlighten me about the rationale behind these decisions?

kinds

The kinds enumeration at the Data Model layer is found here: https://github.com/ipld/specs/blob/master/data-model-layer/data-model.md#kinds

The kinds enumeration in the Schema layer is similar but adds a few more (e.g. struct, enum, etc).

I'd say the meaning isn't wildly different than the type theory one, although we don't really intend to use the word with any kind of "rank-N"/"higher-order" systems. The way we use the word here is also a lot like the way golang uses the word in their type system: https://golang.org/pkg/reflect/#Kind

representations and escaping

tl;dr: Yes, you're right -- the implication of the combination of these features is not entirely pleasing. As currently planned, the struct stringjoin representation simply won't be valid for all ranges of values.

How did we get here?

We don't want to try to support dependent types in general. Too much scope to implement, among many other things.
"Strings that do not contain ':'" is not meant to be a feature we support. It's an example of something that broaches dependent types.
And yet there are protocols out there which use prefixes such as this that I want to be able to support and work well with;
And having representations for struct which explicitly compress them down to strings gives us a way to use "structs" as keys in maps, which is an extremely useful feature that would be otherwise impossible in the face of several other choices we've made.
While implementing stringery like this with escaping systems would be a cardinality-preserving approach, the costs in opinion complexity (are we going to let people choose their escaping style?) and implementation complexity (and multiply it by the number of language clients that need to implement it matchingly!) are staggering, and I suspect such complexity won't be survivable.

So, there's indeed a series of "between a rock and a hard place" issues with this. The approach of rejecting values which would reach unhappy paths does result in the statement "The type definition is no longer that useful since you need to know the representation to know the limits" being then, unfortunately, true -- and that definitely conflicts with some of our fundamental design goals. It also seems to be the only viable approach to providing the feature. So here we are.

The only consolations I can offer are: one, this only comes up with things that were "countable infinity" cardinality already (namely, strings), which seems slightly less bad; two, it's very much an opt-in feature that can be avoided as much as you like (and we can and should put appropriate caveats around it).

For consideration: most common example I've come across for using this kind of stringjoin struct is in the expression of some sort of "plugin" system, where there's a prefix indicating which of the "plugins" to use, and the remainder is an "argument" to that plugin. In this sort of a use-case, the first segment is effectively an enum anyway, and thus this otherwise valid concern becomes moot in practice.

We definitely need more documentation and warnings around these limitations. (The current state of this feature is "planned" moreso than any degree of "shipped", so the docs are currently very much an outline -- sorry :) More warnings == yes please!) If there's things we can do to improve the overall odds of using the feature well and leading users to avoid using it badly -- for example, having schema "compile time" validators that detect and error on invalid compositions that are certain to trigger unreversable concatenations -- I definitely want to explore it as well.

Perhaps also worth mentioning on related topics: there are already some other examples of limitations in composability of some of the features in schemas. Some of these simply emerge from the combinations of features we want.

For example, "kinded unions" have all sorts of fun limitations, and these are based on the representation strategy: for example, a kinded union can contain two different struct types, so long as one is for example a map in representation, and the other is a list (via the 'tuple' representation strategy); or we could add a third via one of the representation strategies that becomes a string. This is an interesting example because the validity of the composition depends on the representations, even though the cardinality logic we can use on the composition can still disregard the representation.

Similarly, "inline unions" have several limitations: they only work when all member types have representation kind map; and if any of those maps are representations of structs with a field name that's the same as the union discriminant key, then we can statically say this is an invalid composition.

So, the limitations around stringjoin and similar representations of structs are not alone in having required consideration of a tug of war between utility and practicality and purity! :)

Regarding kinds: I have never written go, so this did not sound familiar to me. I have written lots of scala, where kinds are usually used in the context of higher kinded types like in the linked wikipedia article. Still not entirely happy with the name, but I guess I can live with it.

Regarding the representations, I guess it would help if there was some kind of description of the goals of the schema layer. Maybe it exists in one of the zillion repos and I have not found it yet...

I guess I don't fully get why you need this extreme representation flexibility. I originally thought that this was just an attempt to add more type information to e.g. dag-cbor, but it seems that you want to have the ability to use the schema layer on top of arbitrary protocols, which then of course leads to the sort of compromises you had to make.

"And yet there are protocols out there which use prefixes such as this that I want to be able to support and work well with;"

I guess I don't fully get why you need this extreme representation flexibility. I originally thought that this was just an attempt to add more type information to e.g. dag-cbor, but it seems that you want to have the ability to use the schema layer on top of arbitrary protocols, which then of course leads to the sort of compromises you had to make.

Sort of, we're seeing IPLD as mainly operating above the protocols/codecs, which is why the "data model" exists, it's kind of a lowest-common-denominator of things we can get the codecs we care about to work with (some of which require a bit of coercion or compromise). If we accept the data model as usable, how can we describe more complex "types" that are built using the individual values in the data model, which is where schemas come in. One reason we've accepted "kinds" as a word to describe the values in the data model is that we hope to be moving to a place where we're referring to them less and to "types" more often, these things that are more formally composed at a layer above the data model.

That's the theory anyway, it's all very new and in flux (hence the documentation problems) so now's a great time to influence the thinking as it evolves. As we're pushing into practical (non-IPFS) use-cases of these things, we're hitting edges that are a little uncomfortable so we come back for discussion so we're pretty open to adjusting course. As an example of this, see https://github.com/ipld/specs/issues/144 where we ran into a problem of (some of us) initially assuming that we could build some things we want now on top of schemas, but finding that the fit isn't quite right, so there's a bit of a fork taking place. We'd like to reconcile that and bring them back into alignment, but right now schemas are a new idea and there's very little infrastructure in place to start building on top of them or extending them.

Some activity right now (might be useful to someone browsing this issue, in lieu of more expansive docs, hopefully):

@mikeal is pursuing his composites ideal in JavaScript @ https://github.com/ipld/js-composites, this currently avoids schemas but there's a goal to bring it in alignment and hopefully use schemas in some way to power it (see https://github.com/ipld/specs/issues/130 for some of that mess)
@warpfork is powering ahead with some codegen @ https://github.com/ipld/go-ipld-prime/pull/21
I started work on a parser for JavaScript, it can parse and do some basic validation and transformation, I was in the middle of building a test suite for it that we could use across runtimes but I paused on that work to shift to some things that are more value-adding in the short term. https://github.com/rvagg/js-ipld-schema
@whyrusleeping has dome some work on a parser for Go, I'm not sure of that status of that but perhaps it'll be used for some of the codegen work. https://github.com/whyrusleeping/ipld-schema
Filecoin did some migration to schemas for describing their specs, it's been pushing the boundaries of what schemas can do: https://github.com/filecoin-project/specs/pull/355 (including the desire for a UInt that our data model doesn't support, see discussion @ https://github.com/ipld/specs/pull/139). There may be other PRs for Filecoin re specs.
I've been using specs to describe types in some of my recent specs https://github.com/ipld/specs/pull/131 & https://github.com/ipld/specs/pull/138. I think it's been quite helpful in pinning down specifics rather than resorting to the usual language-specific descriptions that have happened in the past (see that Filecoin specs PR for how Go-ified this has been before).
@warpfork's work on the Selectors specs @ https://github.com/ipld/specs/blob/master/selectors/selectors.md I think were the earliest incarnation of schemas, at least it's the first I saw of them and I believe that was useful in building out the concepts.

ipld / specs

Questions about ipld schema #140

Schema kinds

Schema representations

kinds

representations and escaping