microsoft / typespec

https://typespec.io/
MIT License
4.44k stars 208 forks source link

Add model constraints #961

Open bterlson opened 2 years ago

bterlson commented 2 years ago

A feature that Bosque has that I covet is the ability to describe constraints on the values of a model in order to e.g. generate code to construct a model that ensures that the data inside the model is correct. We have decorators for simple scenarios but there are many cases where the validations are more complex than the decorators can handle.

Imagine syntax like the following:

model AccountOwner {
  name: string;
  age: int8;

  ensure age > 18;
  ensure name.length > 2;
}

model Account {
  balance: int32;
  kind: "joint" | "single";
  owners: AccountOwner[];
  ensure balance > 0;
  ensure if (kind == "joint") { owners.length > 1 }
}

Syntax wise I think we leverage what we've already done for projections. Semantically, I think these clauses are validated but not run in Cadl. Emitters can access the AST and convert these expressions into target languages.

There are various constraints that don't necessarily apply to projections to keep in mind, i.e. these expressions must be pure in the sense that they should not make modifications to the cadl program at all.

mrkmarron commented 2 years ago

This looks like a great feature. Tons of value in being able to add and check these constraints. Bosque is also working on tools to use them to do structured input fuzzing, semantic diffing for versioning, and auto-generating mocks of services. Would be great to expose a hook for using other languages (e.g. Bosque) for writing these logical constraints.

I opened issue #505 in the Bosque repo to track things on the Bosque side.

timotheeguerin commented 2 years ago

Think we'd need to make sure we can do this for model is, maybe just using self:

model Foo is int32 {
  ensure self > 32;
  ensure self < 64;
}

this feels like partially related to this prospal #602 but we have this constraint as cadl language

bterlson commented 2 years ago

That's a good point. @mrkmarron is that something Bosque can express, say an integer type with values between 32 and 64?

mrkmarron commented 2 years ago

Yeah, definitely possible. In fact Bosque has special support for doing typedecl on most primitive types (Nat, Int, String, ...):

typedecl Foo = Int & {
    invariant $value > 32i;
    invariant $value < 64i;
}

We are basically using the special name $value as the "self". Just keeping in line with the motivation that no invalid instantiations of Foo can ever exist -- i.e. the type of $value is Int not Foo.

markcowl commented 2 years ago

I wonder if we can express similar kinds of constraints for operations? For example, asserting that casing / values are identical across a request and response, or asserting conditional response values based on request values. This strikes me a s useful in service testing.

mrkmarron commented 2 years ago

I spent a bit of time on a proof-of-concept implementation of this feature last week. The fork is here. I took the example @bterlson had and tweaked it a bit to try out some various things, including pre/post constraints like @markcowl mentioned.

@invariant("$age > 18i")
@invariant("!$name.empty()")
model AccountOwner {
  name: string;
  age: int8;
}

enum AccountKind {
  joint,
  single,
}

@invariant("$balance > 0i")
@invariant("$kind === AccountKind::joint ==> $owners.size() > 1n")
model Account {
  @pattern("^[0-9]{4}-[a-z0-9]{7}$") accountID: string;
  balance: int32;
  kind: AccountKind;
  owners: AccountOwner[];
}

enum ErrorKind {
  network,
  access,
  invalid,
}

model Error {
  info: ErrorKind;
}

interface AccountService {
  list(): Account[] | Error;

  read(@pattern("^[0-9]{4}-[a-z0-9]{7}$") id: string): Account | Error;

  @ensures("$return.is<Account>() ==> $return.accountID === id && $return.kind === AccountKind::joint && $return.owners.someOf(pred(oo) => oo.name === owner)")
  addOwner(@pattern("^[0-9]{4}-[a-z0-9]{7}$") id: string, owner: string): Account;
}

The @invariant, @requires, and @ensures annotations and the code they run should be pretty self explanatory. I haven't checked the full projections expression language but from my experiments it looks like the expression language and CADL shapes map pretty well into Bosque.

From this spec we can (of course) generate executable bosque code to validate the pre/post/invariant conditions at runtime. In addition Bosque also is also designed explicitly with solver support so we there are (at least) 3 scenarios that this would also light up -- fuzz input generation, auto service test mock generation, and API delta generation. I have implemented the first 2 of these and can explain the third.

Fuzzing: Given just the specs we can solve for valid inputs, e.g., for read we might generate "0000-0000000 or 1234-a999bcd there are ways we can easily explore generating diverse input sets -- via fixed value region sampling, genetic mutation operators, or GPT-3 style natural input sampling.

Auto-Mocking: From the specs we can solve for a valid output given a concrete input. So, if you want to test against a service you use, instead of writing mocks by hand, you can auto generate them on demand as your test runs and makes calls to various service interface endpoints. In this example you might call addOwner and want to get back a result that satisfies all the constraints of the @ensures clause given the values of the input. Again we will probably want some ways to control or sample from the possible outputs, i.e., always assume success vs. allow errors or generate adversarial values vs. natural happy path results.

I have a little animation of these scenarios working here:

In the third scenario you can imagine semantic changes to a model occurring, like allowing new account ids say where they may have more digits or allow an optional prefix. In this case we can use the same underlying solver to generate inputs (or outputs) that are not valid in version A but are now valid in version B to help test for regressions when updating dependecies.

abatishchev commented 1 year ago

Here's a scenario that might fall under this category too.

Given:

{
  "properties": {
    "auth": {
      "cert": { },
      "aad": { }
    }
  }
}

I want to be able to require at least one properly specified: either cert or aad or (what's important) both.