microsoft / typespec

https://typespec.io/
MIT License
4.1k stars 193 forks source link

Validate/Invariant for models and operations #2041

Open marron-at-work opened 1 year ago

marron-at-work commented 1 year ago

Problem Overview

Currently TypeSpec provides a standard set of decorators (@minValue, @pattern, etc.) for specifying additional logical properties on data. Developers can also, manually, implement custom decorators for specialized logical concepts. However, there is no simple and expressive way to provide what are traditionally called type/data invariants and operation pre/post conditions.

This issue is intended to be the basis for work on adding functionality to support this scenario via the introduction of explicit keywords for adding invariants to models and pre/post conditions to operations in the form of logical conditions expressed as simple expressions in a programming language. The following sample provides a hypothetical vision of how this might look:

model AccountOwner {
  name: string;
  age: int8;

  validate(age > 18, "age is less than 18");
  validate(!name::empty(), "empty name is not allowed");
}

model Account {
  balance: int32;
  kind: "joint" | "single";
  owners: AccountOwner[];

  validate(kind == "joint" ==> owners::length > 1);
}
op read(count: int32): string | null
  validate(count >= 0, "negative count")
  validate(result != null ==> string::length <= count) 
; 

These validate clauses can be processed by emitters to generate executable runtime validation logic as standalone code or part of the parser/validation logic for an application. As they are first class fields in the models/op definitions they can also be annotated with decorators. Two common use cases might be separating validations that are for testing only vs. release mode or identifying validations that are critical to the application correctness vs. that specify some particular implementation behavior that is not semantic version critical:

model Account {
  accountId: int64;
  balance: int32;
  kind: "joint" | "single";
  owners: AccountOwner[];
  transactions: Transaction[];

  @release()
  validate(kind == "joint" ==> owners::length > 1);

  @test()
  validate(transactions::allOf((t) => t.sourceAccount == accountId, "Testing -- check all transaction ids match")
}

op read(count: int32): string | null
  validate(count >= 0, "negative count")
  validate(result != null ==> string::length <= count)

  @implementationLimit()
  validate(count <= 1024, "Performance is best on small buffers -- please transfer less and 1024 values per request"
; 

In addition, as outlined below, via a careful construction of the expression language we can ensure that these clauses are in friendly fragments of first-order logic that can be efficiently modeled to discharge checks like -- are the set of validate clauses mutually exclusive, is a change to a clause semver major/minor. We can also build models of values to create inputs that will satisfy them for tasks like fuzzing or automatically generating mocks. Related issue #961.

Design

The design of this feature involves the addition of 2 keywords (validate and result), the specification of the expression language syntax, and the semantics/reference implementation of the expression language.

Tasks

nguerrera commented 1 year ago

Are we going to make the new keywords contextual to avoid breaking changes?

nguerrera commented 1 year ago

I heard elsewhere that we need the validations to have IDs so that they can be addressed by augment decorators. Having all 3 (ID, Expr, Message) for every validation seems like it will get hard to read.

Can the message be taken from the doc comment / @doc?

I'm thinking something like this:

model Account {
  accountId: int64;
  balance: int32;
  kind: "joint" | "single";
  owners: AccountOwner[];
  transactions: Transaction[];

  /** A joint account must have more than one owner */
  validate moreThanOneJointOwner: kind == "joint" ==> owners::length > 1;
}

Then the error message could also be moved to a "sidecar" file as an augment decorator:

@@doc(Account.moreThanOneJointOwner, "A joint account must have more than one owner.")

I also experimented above with a syntax without parens around the validate args. I think validate(...) looking like a function call is a bit confusing.

marron-at-work commented 1 year ago

The validate with a field "as the id" is interesting. I am a big fan of sidecars for metadata and template strings for use cases like this but the approach should also work with inline @doc annotations as well. Plus, I can still do a little LL(k) lookahead to keep the validate keyword context dependent. E.g. the following would be just fine:

model M {
    validate: int32;
    validate between1and10: 1 <= validate && validate <10;
} 

I'll plan to start implementing a rough version this week to give a bit more concrete to play with.

marron-at-work commented 1 year ago

I took a pass at the changes needed for an initial validate keyword for model statements only -- branch here.

I like how it works generally but would love to get some eyes on it to see if I am doing anything dangerous (it touches a lot of places and I may have missed something important). The syntax is:

model Account {
  owner: string;
  balance: uint64;
  id?: string;

  validate chkname: owner::length() >= 0;

  @doc("sure")
  validate chkbal: balance >= 0;
}

A couple of items:

Any feedback is greatly appreciated. Otherwise I'll continue to improve/extend/test in this direction.