unisonweb / unison

A friendly programming language from the future
https://unison-lang.org
Other
5.81k stars 271 forks source link

Automatic code generation via 'tilde conversions', enabling abilities as typeclasses #765

Open atacratic opened 5 years ago

atacratic commented 5 years ago

This is a proposal for Unison to support a form of automatic code generation.

The mechanism is similar to Scala's implicit conversions, but with some key changes to make it more workable. It's also similar to Haskell's typeclass constraint-solving mechanism.

The application I have in mind is to enable bounded polymorphism via abilities (as per @runarorama's idea). That's what motivates the example below, but [updated with link] I discuss the specifics of that application in this comment on issue 502.

Proposal

~ means 'Unison please try and write a conversion function for me'

The tilde character (~) gets a special meaning, 'try and write a conversion function'. It acts like a function ~ : A -> B, where A and B are to be resolved during typechecking. It gets replaced during compilation with a term that does the required conversion (or else compilation fails). This generated term is just a regular function, composed from other functions in the tilde conversion set using a type-driven search process (discussed under 'conversion search' below).

Conversion functions are composed from functions in the 'tilde conversion set'

The tilde conversion set is a configured set of terms, with types of the form A -> B for various A and B. The set is taken from the set of terms directly underneath some namespace, which defaults to .base.conversions. The namespace to use is configuration that is part of the ucm session. ucm can be configured to use a different set of conversions, or no conversions at all (to disable this feature).

Example - automatically deriving a handle expression for the 'typeclasses' use case

Below is an example of using ~ to (essentially) write a handle expression for us, as part of discharging a typeclass-style constraint. The key bit is the ~(hello a) line, where ~ is causing Unison to generate a call to showTextConversion, which handles the Show Text ability that hello is asking for.

use .base
use .base.io

hello : a ->{IO, Show a} ()
hello a = printLine ("Hello " ++ show a)

foo : '{IO} ()
foo = 'let 
        a = "Alice"
        -- Use ~ to discharge the `Show Text` constraint
        -- This compiles today if you spell out !(showTextConversion '(hello a))
        ~(hello a)

-- Requires this to be in the tilde conversion set:
showTextConversion : '{Show Text, e} b -> '{e} b
showTextConversion b = '(handle showTextHandler in !b)

-- boring stuff follows
ability Show a where
  show : a -> Text

showTextHandler : Request (Show Text) a -> a
showTextHandler r = case r of
  { Show.show t -> k } -> handle showTextHandler in k t
  { a } -> a

There's another example in the gist here showing some non-trivial code generation, to satisfy a Show [Text] constraint.

tilde conversions are 'not part of the language'

During typechecking / code generation, the ~ is replaced with some actual code (a term of function type). It's this that actually gets put into the codebase. It's tagged with a metadata flag, in a similar way as with inline comments. That tag lets Unison treat it specially on view/edit.

When Unison is printing a term for us (say forview/edit), it first does typechecking and code generation again. It takes the code, replaces ~-tagged subterms with actual tildes, and tries to compile the result, using the current tilde conversion set configuration. For each ~, if it gets the same result as was in the codebase, it goes ahead and prints out a ~. If the result is not (structurally) the same term, then it prints out the whole term as it is in the codebase, no ~, no funny business. In this case the user sees the actual code that was originally generated before the add. This can happen if the tilde conversion set is different, or user doing the view/edit is running with tilde conversions disabled.

Commentary

Properties of this proposal

Possible issues/concerns/questions

Use cases and rationale

Use cases...

Rationale:

Conversion search

I'd propose that this mechanism

Suppose your tilde conversion set contains the following.

p : U -> V
q : V -> W
r : B a -> C a
s : C a -> H a -> D a
t : B a -> H a

Then it will

Note that the search probably has to play about with ! and ', as it did in the foo example above.

There's clearly precedent here from Haskell constraint-solving, and Scala implicit search. I'm sure there's a bunch of literature... I haven't had a look though. Probably this is a hard but well-studied problem with some off-the-peg answers?

The addition of effect typing is a fun extra element though.

I guess ideally one would have a proofs that the search terminates, produces a unique canonical result (of the correct type), and succeeds if it's possible to do so - and know what conditions those things impose on the tilde conversion set. I would propose not sweating this too much though, at least not in a formal way.

Maybe if we're initially focussing this on the typeclass use case, then we can start of with implementing a very simplified search where we just try and find things to eliminate the E in A ->{E} B.

Possible extensions

Alternative approaches

atacratic commented 5 years ago

Note that there's now an accompanying discussion of the application to the typeclass use case in issue 502 here.

Another issue: how does tilde conversion interact with type inference? If you leave off the signature from foo in the original example:

foo = 'let 
        a = "Alice"
        ~(hello a)

then it's going to have a hard time working out whether you meant '{IO} () or '{IO, Show Text} (). Generally ~ introduces a degree of freedom which I'm not sure how the typechecking algorithm can cope with. Maybe if the types aren't constrained enough to pin down the A and B in ~ : A -> B, then it should throw an error asking you to add a signature or annotation. Would that be a distressing weakening of our inference story?

aryairani commented 5 years ago

Nice write-up; I didn’t notice it until your recent comment. I had a couple thoughts:

when it has a choice between A -> B and A -> C, throws a failure.

Did you mean, when it has a choice between A -> X -> B and A -> Y -> B?

Second, it would be great if you could implement this in pure Unison using an as-of-yet nonexistent API/support for ucm extensions.

atacratic commented 5 years ago

We discussed on slack: you've spotted that the example is broken, because it doesn't follow the rule you quoted. Specifically, when trying to find a B a -> D a, it should give up due to having both r and t to choose from. Not sure if the rule is wrong or just the example, but it seemed to both of us this was just a superficial glitch, and that the idea can be made to work.

And yes being able to implement this through some kind of metaprogramming / language plugin kind of API would be great!

anovstrup commented 4 years ago

@atacratic This is a neat idea!

In the near-term, it does seem prudent to focus on a simpler subset of this capability, limited to discharging abilities. "Conversions" could be limited to those of the form Request a t -> r, and ~ expr could be rewritten as handle handle ... handle expr with h_n ... with h_2 with h_1, where the h's are all the applicable handlers for expr. I think this would be a lot simpler on the search, type inference, and type checking front.

I also wonder whether, even for the general case, it would be better to have a conversion list (or partially ordered set)? For handlers, the order they're applied can be significant, and the type signature alone won't give enough information to choose the order.

On a syntactic note, it would be nice if you could write ~ f a b c rather than ~(f a b c).