Merge data-models and properties

jonathanrobie commented 9 years ago

This is an attempt to resolve a variety of issues involving data models, properties, and the relationship between them. I am committing the changes, but leaving the issue open and waiting to create a build until it's clear that the three of us agree.

Instead of having both data models and properties, we would have only properties.

#
# Properties
#

global-properties = element global-properties {
  top-level-properties*
}

top-level-properties = element properties {
  name,
  uri?,
  documentation*,
  schema?,
  examples?,
  (nested-properties | property | global-properties-ref )*
}

nested-properties = element properties { name, documentation*, ((nested-properties | property )*  | named-properties-ref ) }
property = element property { name, uri?, occurs?, documentation*, text? }
global-properties-ref = element properties { attribute ref {xsd:string} }

Properties for states, input transitions, and references from within properties are all implemented using global properties references, which point to these named properties. They can no longer define properties inline. Global properties can refer to a schema. Both global properties and individual properties can have URIs that may be associated with semantics.

I think this simplifies some ugly corners of the schema, but it also invites support for cardinality and data types. I didn't add the menu to the restbucks example, but I think doing that will illustrate the issues I'm thinking of. Regardless, this definition allows properties to be specified using an external schema, directly in the global properties definition, or both.

RaySinnema commented 9 years ago

I don't like the term global-properties.

The structure of RADL is very consistent: we have lists of items that contain items, and we name the lists after the items. For example, the <resources> element contains a list of <resource> elements. I'd very much like to stick with this pattern, which means we need a term with both a singular and plural form.

global-properties is not the plural of properties. Also, we use the term properties in two different meanings here, which may be confusing.

Since we've been struggling to come up with a good term, it makes sense to look at what other people are doing:

API Blueprint calls them "data structures" and defines them inside the API description in Markdown Syntax for Object Notation
JSON API calls them "attributes" and allows informal internal JSON structures
RAML calls them "schemas" and allows both inline and external definitions in either XML Schema or JSON Schema
Swagger calls them "definitions" and uses a subset of JSON Schema. But [Swagger UI]() refers to "models" and "model schemas" in the Data Type for the body parameter
WADL calls them "grammars" and allows internal as well as external definitions in XML Schema or Relax NG

Clearly the existing API description languages don't agree on a single term. RESTUnited deals with multiple API description formats and so had to come up with a term that covers all of them: "models".

I think "definition" and "model" are too vague. "Grammar" is not quite right. "Schema" is right for RAML's use, but not for how we intend to use the term. "Attributes" suffers from the same problems as "properties".

That leaves "data structure", which is too technical. I'm looking for something that's technology independent (in this case meaning independent of media types). The term "logical data model" captures it well, but that's a bit long. So maybe "data model" wasn't so bad after all?

jonathanrobie commented 9 years ago

I think this is about more than finding the right term, I think we should start by defining a clear story for what a data-model element is, and how it relates to the rest of RADL. For instance, in your last comment, you say "RAML calls them ... Swagger calls them ... WADL calls them", but all of these use XML schemas or JSON schemas, which correspond to concrete representations in XML or JSON, and you recently suggested that we rename data-model to message (#31). That's completely different from saying that a data model is a "logical data model".

Let's assume we don't know what to name this outer level element, and call it x-container for now. Why is it there, and what does it represent?

Let's start with the basic model behind RADL: states have properties and transitions, representations represent properties using XML elements / attributes or JSON objects / arrays. That's clear and simple. States can have properties, representations can have schemas. Some applications define messages using properties some define them using schemas, some do both (perhaps using properties just for the properties that directly influence the REST semantics of an API).

Originally, properties were defined directly in states. As I understand it, the reason we introduced x-container is that (1) multiple states may share the same set of properties, and (2) some kinds of documents do not correspond to states, e.g. documents used for input. So we essentially need a name for a container that represents the same thing that properties used to represent in a state. These properties are still abstract, not XML or JSON representations that can be validated with the corresponding schema languages. These properties can be associated with URIs that represent semantics. These properties need to be named so they can be referenced.

Do we agree that this is the model? If so, that should simplify the search for a name.

RaySinnema commented 9 years ago

Yes, that is the model.

jonathanrobie commented 9 years ago

I think we are only likely to use this for data that looks a lot like objects. Here's one possible approach:

Continue to allow XML schemas (RELAX-NG, XSD) or JSON-Schema to describe representations
For properties of states, allow a traditional class declaration along these lines:

classes = element classes { class* }
class = element class { name, property* }
class-ref = element class { ref }
property = element property { name, type?, uri?, optional?, repeats? }
repeats = attribute repeats { cardinality }
cardinality = "*"| "+" 
optional = "true" | "false"

The type of a property can be the name of a built in type or the name of a class.

Obviously, this won't work for semi-structured data, but that's really not what we intend to use it for.

RaySinnema commented 9 years ago

Thanks for the new proposal. Here are some general comments:

I don't like the use of any XML schema, because:
1. They cover both data and documents, while we only need data.
2. They don't support defining semantics.
While JSON Schema seems to have the right expressive power, I also don't like it because
1. The draft expired two years ago. The examples that are out there, like Card, don't even seem to conform to the latest draft. IOW, it's status is unclear.
2. It uses JSON syntax, while the rest of RADL uses XML syntax. Merging the two will look funny.
3. It doesn't seem to support defining semantics.
So I think that it makes more sense to define our own little "schema language" in XML, guided by the expressive power of JSON Schema.

About the class declaration proposal:

"Class" is another one of those heavily overloaded words. In the knowledge representation meaning it comes quite close to what we need. In the computer programming meaning, however, it implies behavior, which is really not what we're looking for at all. Given that the latter meaning is probably better known within our target audience, I don't think "class" is a better name than "data model".
We still need a uri attribute at the root level (class in this proposal) to capture its semantics.
optional and repeats partially overlap. Why not use a single cardinality to capture both?
Having the type attribute refer to either a built-in type or a class is a neat trick to simplify the schema. It does have the drawback that nested structures must be flattened using separate class elements, even if they're only ever used inside a single class. I think this common case is going to look bad with this proposal.

jonathanrobie commented 9 years ago

I'm going to respond with two comments. I think we're mostly on the same page, modulo naming, except for what looks like a proposal to remove support for schemas in representations. I don't want to derail moving forward, so I'll respond to the comments about the grammar separately, and address the things I disagree with here.

1. We need a better name for this

Let's get the requirements and structure right and consider names in parallel (perhaps in different comments). Part of what makes this tricky is that different domains have made the class / instance distinction in very different ways. And it's also tricky because we're inventing a concept that is not native to REST - named sets of properties that can be used in more than one state or as input, defined at an abstract level. Whatever name we use, the name corresponds to a single class (in the logical sense, not the OO sense), not an instance or a serialization format or a model. This is absolutely not meant to be used for data modeling, and it does not imply any form of persistence.

And it is optional. You don't have to use it. Some applications will not. We definitely don't want the name to imply that RADL has a data model that is at the heart of all APIs.

2. We still need schema languages

This proposal has nothing to do with removing support for RELAX-NG, XSD, or JSON Schema, which apply to concrete representations, and are on a different level of abstraction than data models or classes or whatever we decide to call it. Both data models and schemas are optional, and I would be strongly opposed to changing that. We need to support many different ways of using RADL, including example-driven design, use of industry standard schemas, etc. Our proprietary "schema" language is more like class declarations in Java, and is not designed to replace standard schema languages.

RaySinnema commented 9 years ago

Yes, we still need schema languages for concrete representations (i.e. serialized "data models"). I was strictly talking within the context of the "data model".

jonathanrobie commented 9 years ago

We still need a uri attribute at the root level (class in this proposal) to capture its semantics.

Yes. I'll add it.

optional and repeats partially overlap. Why not use a single cardinality to capture both?

I don't have a strong opinion on which way is better. My main reason for separating them is to use plain, simple English for attribute names. If William has a preference, let's go with that, otherwise flip a coin?

Having the type attribute refer to either a built-in type or a class is a neat trick to simplify the schema. It does have the drawback that nested structures must be flattened using separate class elements, even if they're only ever used inside a single class. I think this common case is going to look bad with this proposal.

Modulo the name and how to support cardinality, something like this?

classes = element classes { class* }
class = element class { name, uri?, property* }
class-ref = element class { ref }
property = element property { name, uri?, optional?, repeats?, (type | property*) }
repeats = attribute repeats { cardinality }
cardinality = "*"| "+" 
optional = "true" | "false"

RaySinnema commented 9 years ago

I agree we should focus on getting the structure right first (see #6 and #32) and treat the name as a separate issue (see #31).

RaySinnema commented 9 years ago

I'm still afraid the loss of nesting is going to hurt. Maybe we should code up some examples in this proposed schema and see how much it bothers us to have to flatten all the classes.

jonathanrobie commented 9 years ago

I'm still afraid the loss of nesting is going to hurt. Maybe we should code up some examples in this proposed schema and see how much it bothers us to have to flatten all the classes.

This nests (and is taken directly from the schema above):

property = element property { name, uri?, optional?, repeats?, (type | property*) }

A property can have either a type (an attribute that is the name of a built in type or a class) or a sequence of property elements.

We definitely need sets of examples, I was planning to work on that after returning from a run later this morning.

jonathanrobie commented 9 years ago

Could a state have both a name and an object without requiring the user to define a new class? Or could a state include more than one address? I suspect that might be useful.

jonathanrobie commented 9 years ago

Yes, we still need schema languages for concrete representations (i.e. serialized "data models"). I was strictly talking within the context of the "data model".

Got it. Schema languages are the wrong level of abstraction for the data model.

RaySinnema commented 9 years ago

This nests (and is taken directly from the schema above)

Ah, my bad. Yes, I like that.

RaySinnema commented 9 years ago

Could a state have both a name and an object without requiring the user to define a new class? Or could a state include more than one address? I suspect that might be useful.

Do you mean "state" as in the <state> element? If so, then I don't understand the "object" and "address" you refer to. If not, then I don't understand what you mean by "state".

jonathanrobie commented 9 years ago

Here's the current status, with examples.

First the schema:

#
# Property groups
#

property-groups = element property-groups { property-group* }
property-group = element property-group { name, (property| property-ref)* }
property-ref = element properties { attribute ref { string } }
property = element property { name, property-type?, uri?, optional?, repeats?, (property| property-ref)* }
property-type = attribute type { ptype }
ptype = "string" | "number" | "boolean" | string
optional = attribute optional { "true" | "false" }
repeats = attribute repeats { cardinality }
cardinality = "*"| "+"

Some examples:

  <property-groups>
    <property-group name="items">
      <property name="items">
        <property name="item" repeats="+">
          <property name="name" type="string"/>
          <property name="quantity" type="number"/>
          <property name="price" type="number"/>
        </property>
      </property>      
    </property-group>

    <property-group name="menu">
      <properties ref="items"/>
    </property-group>

    <property-group name="order">
      <properties ref="items"/>
    </property-group>

    <property-group name="receipt">
      <properties ref="items"/>
    </property-group>

  </property-groups>

Property groups can also be referred to from states:

    <state name="Deciding">
      <properties ref="menu"/>
         !!! SNIP !!!

Open question (which I tried to ask in my previous post): should a state be able to reference more than one property group?

jonathanrobie commented 9 years ago

Resolved with a small variation on the above schema, per Tuesday's meeting.

property-groups = element property-groups { property-group-top* }
property-group-top = element property-group { attribute ref {string} | (name, (property | property-group )*) }
property-group = element property-group { optional?, repeats?, (attribute ref {string} | (name, (property | property-group )*)) }
property-group-ref = attribute property-group { string }
property = element property { name, property-type?, uri?, optional?, repeats? }
property-type = attribute type { "string" | "number" | "boolean" | string }
optional = attribute optional { "true" | "false" }
repeats = attribute repeats { "true" | "false"  }

  <property-groups>
    <property-group name="items">
      <property-group name="item" repeats="true">
        <property name="name" type="string"/>
        <property name="quantity" type="number"/>
        <property name="price" type="number"/>
      </property-group>
    </property-group>

    <property-group name="menu">
      <property-group ref="items"/>
    </property-group>

  <state name="Ordered" property-group="order">
      <transitions>
        <transition name="Change" to="Ordered">
          <documentation> As long as the customer hasn't paid, she can change her order. </documentation>
          <input property-group="order"/>
        </transition>

RaySinnema commented 9 years ago

I thought we'd also agreed to get rid of the optional attribute on a property.

And then there is the issue of what to name what is called a property-group in the above. I still believe that data transfer object much better describes what this concept means and how it is supposed to be used.

jonathanrobie commented 9 years ago

I don't think we had agreed to get rid of optional on a property. And without it, I don't know how we would represent optional properties. repeats is Boolean, and can't tell us if the property is optional.

I'm still not convinced that "the properties of a state" is the same thing as "a data transfer object", and I still think this mixes levels of abstraction (class vs. instance).

RaySinnema commented 9 years ago

Why do we need to know whether a property is optional?

jonathanrobie commented 9 years ago

For input, people need to know whether supplying it is required. For states, people sometimes need to know if they can rely on it always being there.

This feature was requested by one of our solutions teams.

RaySinnema commented 9 years ago

For states, people sometimes need to know if they can rely on it always being there.

This seems to me to be a bad practice, coupling client to server and either breaking the client when the server changes, or preventing the server from evolving. The fewer assumptions clients make, the better. Since some properties will be optional, clients must check those properties. Why not be safe and check all of them?

RaySinnema commented 9 years ago

For input, people need to know whether supplying it is required.

I can see that, but that doesn't necessarily mean we have to add something to the RADL schema. The alternative would be to add something in the <documentation>.

The advantage I see of adding it to the schema is that we can enforce consistency in the documentation by generating that part.

The disadvantage I see of adding it to the schema is that we might not want such consistency. For many properties, the description of what the property means will make it clear whether a value is required for the functionality. In those cases, adding required or optional to the documentation is redundant, but the API designer looses the ability to leave it out if we always generate it.

Another disadvantage is that it's not at all clear what a good default value for the attribute would be. In some APIs most properties will be required, while in other APIs most will be optional. If we pick optional as the default, for instance, then someone designing an API with mostly required properties needs to add a lot of markup to prevent the generated documentation from being wrong. I think RADL should save people work, not add to their workload.

Also, if we were to add an attribute for this purpose, then I'm not at all convinced we should name it optional. It makes more sense to me to use required="true" than optional="false". Compare this to the asterisks used on web forms to indicate required fields. Alternatively, we could follow XML Schema: use="required".

jonathanrobie commented 9 years ago

Just for the record, here are the decisions I recorded at the last meeting. This was displayed at the end of the meeting, Ray and I were both present and said we agreed with these decisions as written, William was on vacation. I have not changed this except to fix one typo.

- property-groups is OK
- Ray prefers DTO, Jonathan will think about this, we need to ask William
- In states, Ray prefers attributes (minor preference). Jonathan will consider.
- agreed to use property-groups for nesting as below
- repeats becomes a Boolean

There was no decision to remove the optional attribute.

jonathanrobie commented 9 years ago

Optional is used here and in other places to allow an API designer to explicitly state that something is required or optional. The default value is 'unspecified', no definite statement as to whether it is required or not. That's important, because in early stages of a design, you often don't know, and the choice of default may depend on the API (which can document a default if it needs one).

I just checked the schema, and in the other places we do this we use a required attribute rather than an optional attribute. I think we should do the same here, for the sake of consistency.

RaySinnema commented 9 years ago

I think the 'unspecified' default solves my issues. Thanks for pointing that out.

jonathanrobie commented 9 years ago

I believe the only remaining question is whether there's a better name than property-groups?

RaySinnema commented 9 years ago

Yes from my perspective.

gentlewind commented 9 years ago

As I can see, the property-groups are used in two places:

the transfer data input in the state transition (the properties are not necessarily to be part of the target resource)
the persistent data to describe the target resource state

property-groups is a safe name. Others may be like: properties, state-entity, data-object, seeming like no much difference.

jonathanrobie commented 9 years ago

This is already in the schema, using property-groups. Can we accept it as is?

RaySinnema commented 9 years ago

+1

gentlewind commented 9 years ago

+1

RaySinnema commented 9 years ago

Two little things I noticed only just now. First, we need to be able to name references. e.g.:

<property-group name="address"> ... </property-group>

<property-group name="contact">
  <property-group name="post address" ref="address"/>
  <property-group name="visit address" ref="address"/>
</property-group>

This was the example used in #32.

This is not valid according to the current schema:

property-group = element property-group { 
    required?,
    repeats?,
    (attribute ref {string} | (name, (property | property-group )*))
}

Second, we need to be able to add semantics to top-level property-groups, e.g.

<property-group name="home" uri="http://schema.org/CafeOrCoffeeShop"/>

That isn't valid according to the current schema:

property-group-top = element property-group { 
    attribute ref {string} | (name, (property | property-group )*)
}

I propose to fix both issues as follows:

property-group-top = element property-group { 
  uri?,
  (property-group-internal-ref | (name, (property | property-group )*))
}
property-group-internal-ref =  attribute ref { xsd:string }
property-group = element property-group { 
  name, 
  required?, 
  repeats?,
  uri?,
  (property-group-internal-ref | (property | property-group)*) 
}

jonathanrobie commented 9 years ago

+1

RaySinnema commented 8 years ago

This is the current schema for start-state:

start-state = element start-state {
  documentation*,
  property-group-ref?,
  state-transitions?
}

I don't think it should have a property-group-ref, since it's not really a state. It only serves to hold a transition to the home state (i.e. the GET on the billboard URI). So my proposal is:

start-state = element start-state {
  documentation*,
  state-transitions?
}

jonathanrobie commented 8 years ago

Removed property-group-ref from start-state.

restful-api-description-language / RADL

Merge data-models and properties #34

1. We need a better name for this

2. We still need schema languages