ucsd-progsys / liquidhaskell

Liquid Types For Haskell
BSD 3-Clause "New" or "Revised" License
1.18k stars 134 forks source link

Using Type Families for Specifications #340

Open ranjitjhala opened 9 years ago

ranjitjhala commented 9 years ago

Thanks to @nikivazou efforts with {-@ measure @-} and {-@ inline @-} 90%of specs are now pure haskell. The last thing was {-@ type ... @-}.

Recall the discussion we were having earlier -- about using aliases with Haskell type params for refinements -- we were stumped on the issue of binders and wanted to ask @simonpj (before my back gave out...) if we could hope for some eventual support from GHC for that.

Today, @spinda had a very nice idea that I think avoids the need for special support: in a nutshell, to use type families and type level strings.

For example, consider this:

{-@ foo :: x:Int -> {v:Int | v > x } @-}
foo x = x + 1

The idea is to create two type families:

type family (:) x t where
   x : t = t

type family (|) t p where 
  t | p = t 

Essentially : represents "bind" and | represents "such-that" so the type above would be written as:

foo :: "x":Int -> Int | ("v" > "x")
foo x = x + 1

Or, perhaps even better, with an alias:

type IntGT x = Int | ("v" > x)

foo :: "x":Int -> IntGT "x"
foo x = x + 1

Of course,

The great thing about this proposal is that we don't have to have our own broken mechanisms for name resolution etc., but can reuse GHC's wholesale for this purpose...

Thoughts?

gridaphobe commented 9 years ago

There are a couple of (syntactic) problems with this unfortunately.

  1. | is a reserved character, so we cannot make it a type family
  2. : already exists at the type level (as "cons")

There is of course a simple solution, use different symbols, but some people (@nikivazou?) won't be happy about this.

I'm also not a big fan of using string literals as we'd have to perform scope-checking ourselves, but this is minor.

ranjitjhala commented 9 years ago
  1. Sure, we'd have to use something other than |, :.
  2. We have to do scope checking ourselves anyway, i.e. this is no worse than what we currently have (except for the extra quotations.

On Fri, Feb 13, 2015 at 3:51 PM, Eric Seidel notifications@github.com wrote:

There are a couple of (syntactic) problems with this unfortunately.

  1. | is a reserved character, so we cannot make it a type family
  2. : already exists at the type level (as "cons")

There is of course a simple solution, use different symbols, but some people (@nikivazou https://github.com/nikivazou?) won't be happy about this.

I'm also not a big fan of using string literals as we'd have to perform scope-checking ourselves, but this is minor.

Reply to this email directly or view it on GitHub https://github.com/ucsd-progsys/liquidhaskell/issues/340#issuecomment-74347849 .

Ranjit.

gridaphobe commented 9 years ago

That's why I called it a minor issue :smile:

It would just be nicer if GHC were to handle the scope-checking with our hypothetical language extension (which I'm sure Richard will need eventually for his Dependent Haskell work).

nikivazou commented 9 years ago

The Dependent Haskell idea is to have subset (or Refinement) Types, so we should discuss the syntax that the Dependent Haskell people have in mind.

It would be nice if we both supported the same syntax and the user could switch between liquidHaskell and dependent types.

gridaphobe commented 9 years ago

@nikivazou yes that's exactly what I've been envisioning!

spinda commented 9 years ago

Anyone know what syntax they're planning on using? There's a Trac page here and a repository here, but I haven't been work that out from either.

I agree that it would be better to use a native mechanism for binders/etc., but it looks like it's going to be a long time before DependentHaskell can be merged into GHC. If something like what's described here were implemented, parts could be replaced by the corresponding GHC features as they get merged in. Until then, this would resolve all of our current issues with name resolution and alias expansion, and most of our issues with parsing and error messages.

ranjitjhala commented 9 years ago

I agree with @spinda -- this seems:

1 easy to do right now 2 major benefits re resolution 3 and at any rate doing this does not in any way preclude using native haskell binders as and when they get added down the line.

The only issue I have is whether we can make it look nice and not add too much noise, and I suspect we can...

On Feb 13, 2015, at 4:33 PM, Michael Smith notifications@github.com wrote:

Anyone know what syntax they're planning on using? There's a Trac page here and a repository here, but I haven't been work that out from either.

I agree that it would be better to use a native mechanism for binders/etc., but it looks like it's going to be a long time before DependentHaskell could be merged into GHC. If something like what's described here were implemented, parts could be replaced by the native GHC features as they get merged in. Meanwhile, this would resolve all of our current issues with name resolution.

— Reply to this email directly or view it on GitHub.

gridaphobe commented 9 years ago

The time-to-release point is compelling, if we extended GHC we probably couldn't get the work in before 7.12 (which would be about a year from now). So I'd suggest we pursue both angles :)

spinda commented 9 years ago

@gridaphobe Agreed! The two ideas complement each other, I think. This one gets us half-way to the other.

spinda commented 9 years ago

There's an example of how this might look at tests/todo/TypeSynonyms.hs

spinda commented 9 years ago

Writing up some notes based on previous discussions and my own further research. Apologies up-front for the length.


Encoding more complex elements of LiquidHaskell's type system (bounds, etc) in type synonyms is possible, but structuring this in a way that is nice for the user to write by hand isn't really feasible.

A quasiquoter can be used to bridge this gap. Say lq is a quasiquoter in some module LiquidHaskell; then, code like this:

{-@ type Nat = { v:Int | v >= 0} @-}

{-@ add :: x:Nat -> y:Nat -> { v:Int | v = x + y } @-}
add :: Int -> Int -> Int
add x y = x + y

becomes:

import LiquidHaskell

type Nat = [lq| { v:Int | v >= 0} |]

add :: [lq| x:Nat -> y:Nat -> { v:Int | v = x + y } |]
add x y = x + y

Note how this version doesn't require a duplicate type signature for add, and also note that Nat can be used in normal Haskell code with normal namespacing.

The quasiquoter lq is run automatically by GHC during compile-type (no need for custom modifications or special configuration), producing the type synonym encoding from the source signatures passed to it; something like:

-- LiquidHaskell internal definitions
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE TypeOperators #-}

import GHC.TypeLits

data Brel = Eq | Ne | Gt | Ge | Lt | Le | Ueq | Une
data Pred = PAtom Brel Expr Expr

data Constant = I Nat
data Bop  = Plus | Minus | Times | Div | Mod
data Expr = ECon Constant | EBin Bop Expr Expr | EVar Symbol

type Bind (x :: Symbol) a = a
type Refine a (r :: Pred) = a
-- Transformed code (within GHC)

type Nat = Refine (Bind "x" Int) (PAtom Ge (EVar "v") (ECon (I 0)))

add :: Bind "x" Nat -> Bind "y" Nat -> Refine (Bind "v" Int) (PAtom Eq (EVar "v") (EBin Plus (EVar "x") (EVar "y")))
add x y = x + y

As far as the rest of the GHC pipeline is concerned, this boils down to:

type Nat = Int

add :: Int -> Int -> Int
add x y = x + y

but from LiquidHaskell, we can access GHC's represention of these types, which retains the synonym information.

This removes the need for duplicate data declarations, eg.

data Thing { count :: Nat }

vs

data Thing { count :: Int }

{-@ data Thing { count :: Nat } @-}

Handling locals becomes much more straightforward as well: instead of looping through the CoreBinds to guess which var a comment corresponds, we can simply pull the signature from GHC. This removes an inconsistency between LiquidHaskell and Haskell, as the current implementation of LH's local signatures doesn't follow the same rules as GHC.

Other elements of LiquidHaskell syntax could potentially be replaced by fully-native encodings. For example, embeds could be specified as GHC annotations:

{-@ embed GHC.Types.Int as int @-}

becomes something like:

{-# ANN type Int EmbedAs "int" #-}

Measures within {-@ @-} can be replaced with lifted Haskell measures, possibly using annotations in place of custom-implemented comments:

{-# ANN len Measure #-}
len :: [a] -> Nat
len [] = 0
len (x:xs) = 1 + len xs

This could also be written with a quasiquoter shortcut, if preferred:

[measure| len :: [a] -> Nat |]
len [] = 0
len (x:xs) = 1 + len xs

which would expand to the same code as above through a very simple transformation.

Not sure about invariants. Those might have to stay as special comments.


Signature processing under this system can be broken down into a few passes:

  1. GHC invokes the lq quasiquoter on source type signatures. lq parses these signatures and outputs a type synonym-encoded AST within Type.
  2. GHC handles type synonym expansion, type variable substitution, etc. automatically. (no new code needed)
  3. LiquidHaskell iterates through types from GHC, rewriting from Haskell types and type synonym AST nodes to the internal type representation.
  4. LiquidHaskell performs a name resolution pass, resolving eg. measures using GHC's resolution context. (adapt from existing code)
  5. LiquidHaskell peforms a validation pass, doing sort checking and so on. (adapt from existing code)

Because lq only handles signatures in isolation, the parsing logic can become much simpler than what we have now. The rest of the parsing is handled by GHC, eliminating the rest of the parser quirks and error cases in the current implementation.


There is one catch, though: we need to retain position information relative to the quasiquoter in the raw Haskell source for error reporting. Fortunately, quasiquoters give us access to qLocation, which passes through the source span of the quasiquoted text. This can be used as the start location in lq's parser; then position information will be included in the type synonym AST. Something like:

-- LiquidHaskell internal definitions
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE TypeOperators #-}

import GHC.TypeLits

data Pos = Pos { posLine :: Nat, posCol :: Nat }
data Span = Span { spanFile :: Symbol, spanStart :: Pos, spanEnd :: Pos }
data Located a = Loc Span a

data Brel = Eq | Ne | Gt | Ge | Lt | Le | Ueq | Une
data Pred = PAtom (Located Brel) (Located Expr) (Located Expr)

data Constant = I Nat
data Bop  = Plus | Minus | Times | Div | Mod
data Expr = ECon Constant | EBin (Located Bop) (Located Expr) (Located Expr) | EVar Symbol

type Bind (x :: Located Symbol) a = a
type Refine a (r :: Located Pred) = a
-- Transformed code (within GHC, complexity hidden from user)

type Nat = Refine (Bind (Loc (...) "x") Int) (PAtom (Located (...) Ge) (Loc (...) (EVar "v")) (Loc (...) (ECon (I 0))))

add :: Bind (Loc (...) "x") Nat -> Bind (Loc (...) "y") Nat -> Refine (Bind (Loc (...) "v") Int) (PAtom (Loc (...) Eq) (Loc (...) (EVar "v")) (EBin (Loc (...) Plus) (Loc (...) (EVar "x")) (Loc (...) (EVar "y"))))
add x y = x + y

Thanks to the quasiquoter, this extra complexity is handled automatically and hidden from the user.


Implementation of lq, the type synonym AST, and the transformation pass can be done outside of LiquidHaskell, as they're all independent of its inner workings. This functionality could actually be packaged in a separate module from the rest of LiquidHaskell. Then the LiquidHaskell type AST, extraction from Haskell types, pretty printing, etc. could be reused in plugins for, say, GHCi and Haddock to display LH types to the user. Otherwise these tools will show the plain Haskell types as they do now.

gridaphobe commented 9 years ago

I've wanted to extract the core LH types from the main package for a while too, so I can inject target between liquidhaskell-types and liquidhaskell, thus adding a simple form of counterexample generation to LH.

Another thing that would be really nice along these lines is if the lq quasiquoter could add the refined type as a top-level value, e.g.

[lq| add :: x:Nat -> y:Nat -> { v:Int | v = x + y } |]
add x y = x + y

would expand to something like

add_type :: SpecType
add_type = [lq| x:Nat -> y:Nat -> { v:Int | v = x + y } |]

add x y = x + y
{-# ANN add Refined add_type #-}

If we reify the refined types to values in the module (at compile time), then we don't need to invoke GHC at LH run time, which is another source of pain as we have to reproduce the correct GHC options for our invocation.

spinda commented 9 years ago

In the reified case, wouldn't we still have to invoke GHC to load the module and access the reified type?

ranjitjhala commented 9 years ago

We have to invoke GHC (in LH) to get the core and many other things...

On May 25, 2015, at 1:32 PM, Eric Seidel notifications@github.com wrote:

I've wanted to extract the core LH types from the main package for a while too, so I can inject target between liquidhaskell-types and liquidhaskell, thus adding a simple form of counterexample generation to LH.

Another thing that would be really nice along these lines is if the lq quasiquoter could add the refined type as a top-level value, e.g.

[lq| add :: x:Nat -> y:Nat -> { v:Int | v = x + y } |] add x y = x + y would expand to something like

add_type :: SpecType add_type = [lq| x:Nat -> y:Nat -> { v:Int | v = x + y } |]

add x y = x + y {-# ANN add Refined add_type #-} If we reify the refined types to values in the module (at compile time), then we don't need to invoke GHC at LH run time, which is another source of pain as we have to reproduce the correct GHC options for our invocation.

— Reply to this email directly or view it on GitHub.

gridaphobe commented 9 years ago

Ah yes, I guess that would only actually help for target since I only need the types and measure definitions.

gridaphobe commented 9 years ago

Actually, if we were to make the specifications into annotations as I suggested, we could make LH a Core-to-Core plugin. Then GHC would call us once the Core has been generated.

It might be worth putting together a list of everything we extract from GHC's API to see what else we would need.

spinda commented 9 years ago

If we don't insert the specifications as Haskell types, then we still have to do our own synonym expansion, resolution, substitution, etc, instead of having GHC take care of that for us. But I don't think we would need to use annotations exclusively to set LH up as a GHC Plugin (which seems like a good idea). If it helps with target, we could expose specifications as both types and annotations.

gridaphobe commented 9 years ago

Why wouldn't we just have GHC do all of the expansion for us as per your proposal and stick the result in the module as a top-level value?

spinda commented 9 years ago

I think we might be saying the same thing, actually... But it'd probably be easier to discuss this in person tomorrow.

gridaphobe commented 9 years ago

Ok :)

ranjitjhala commented 9 years ago

This looks quite nice IMO, lets see if we can fill out all the details...

ranjitjhala commented 9 years ago

@spinda -- @nikivazou and I were pondering this a bit today, and we think a further simplification of your approach is to:

(1) LIMIT TH/Splices to type alias definitions, (2) USE plain haskell signatures for functions.

This would let us write things like

{-# LANGUAGE TypeOperators #-}

module Foo where

type IntGE x  = Int // [lq| {v | x <= v}]
type Plus x y = Int // [lq| {v | v == x + y}]

type (#)  b t = t
type (//) t r = t

foo :: x # Int -> y # Int -> Plus x y
foo x y =  x + y

That is, we ONLY use splices for refinements, inside type alias definitions (since that is what gets super cumbersome in pure Haskell.) But the rest is just plain Haskell.

What do you think? (note that the stuff above compiles right now with GHC, with the desired behavior as long as you replace the splices with something valid e.g. Int)

On Tue, May 26, 2015 at 11:17 AM, Ranjit Jhala jhala@cs.ucsd.edu wrote:

This looks quite nice IMO, lets see if we can fill out all the details...

Ranjit.

gridaphobe commented 9 years ago

This might be a reasonable first goal, but I don't like the idea of limiting refinements to type aliases. That adds greatly to the cognitive overhead of using refinement types because now I have to come up with a name for each refinement I want to use.

My approach to developing with refinement types is quite the opposite. I start with "inline" refinements everywhere, and then try to figure out clean aliases once the dust has settled.

ranjitjhala commented 9 years ago

No reason why this proposal would preclude the function signatures, I think you're right we should just start with type aliases because that was already take us quite far and then we can do Michael's full proposal.

On May 29, 2015, at 7:05 PM, Eric Seidel notifications@github.com wrote:

This might be a reasonable first goal, but I don't like the idea of limiting refinements to type aliases. That adds greatly to the cognitive overhead of using refinement types because now I have to come up with a name for each refinement I want to use.

My approach to developing with refinement types is quite the opposite. I start with "inline" refinements everywhere, and then try to figure out clean aliases once the dust has settled.

— Reply to this email directly or view it on GitHub.