Automatic code generation via 'tilde conversions', enabling abilities as typeclasses

atacratic commented 5 years ago

This is a proposal for Unison to support a form of automatic code generation.

The mechanism is similar to Scala's implicit conversions, but with some key changes to make it more workable. It's also similar to Haskell's typeclass constraint-solving mechanism.

The application I have in mind is to enable bounded polymorphism via abilities (as per @runarorama's idea). That's what motivates the example below, but [updated with link] I discuss the specifics of that application in this comment on issue 502.

Proposal

`~` means 'Unison please try and write a conversion function for me'

The tilde character (~) gets a special meaning, 'try and write a conversion function'. It acts like a function ~ : A -> B, where A and B are to be resolved during typechecking. It gets replaced during compilation with a term that does the required conversion (or else compilation fails). This generated term is just a regular function, composed from other functions in the tilde conversion set using a type-driven search process (discussed under 'conversion search' below).

Conversion functions are composed from functions in the 'tilde conversion set'

The tilde conversion set is a configured set of terms, with types of the form A -> B for various A and B. The set is taken from the set of terms directly underneath some namespace, which defaults to .base.conversions. The namespace to use is configuration that is part of the ucm session. ucm can be configured to use a different set of conversions, or no conversions at all (to disable this feature).

Example - automatically deriving a handle expression for the 'typeclasses' use case

Below is an example of using ~ to (essentially) write a handle expression for us, as part of discharging a typeclass-style constraint. The key bit is the ~(hello a) line, where ~ is causing Unison to generate a call to showTextConversion, which handles the Show Text ability that hello is asking for.

use .base
use .base.io

hello : a ->{IO, Show a} ()
hello a = printLine ("Hello " ++ show a)

foo : '{IO} ()
foo = 'let 
        a = "Alice"
        -- Use ~ to discharge the `Show Text` constraint
        -- This compiles today if you spell out !(showTextConversion '(hello a))
        ~(hello a)

-- Requires this to be in the tilde conversion set:
showTextConversion : '{Show Text, e} b -> '{e} b
showTextConversion b = '(handle showTextHandler in !b)

-- boring stuff follows
ability Show a where
  show : a -> Text

showTextHandler : Request (Show Text) a -> a
showTextHandler r = case r of
  { Show.show t -> k } -> handle showTextHandler in k t
  { a } -> a

There's another example in the gist here showing some non-trivial code generation, to satisfy a Show [Text] constraint.

tilde conversions are 'not part of the language'

During typechecking / code generation, the ~ is replaced with some actual code (a term of function type). It's this that actually gets put into the codebase. It's tagged with a metadata flag, in a similar way as with inline comments. That tag lets Unison treat it specially on view/edit.

When Unison is printing a term for us (say forview/edit), it first does typechecking and code generation again. It takes the code, replaces ~-tagged subterms with actual tildes, and tries to compile the result, using the current tilde conversion set configuration. For each ~, if it gets the same result as was in the codebase, it goes ahead and prints out a ~. If the result is not (structurally) the same term, then it prints out the whole term as it is in the codebase, no ~, no funny business. In this case the user sees the actual code that was originally generated before the add. This can happen if the tilde conversion set is different, or user doing the view/edit is running with tilde conversions disabled.

Commentary

Properties of this proposal

People can think of the ~ as just sugar giving customizable views of the same underlying code - they can look under the covers at the raw code without ~s whenever they want to.
There's not hidden magic. If there's code generation going on, it's always localized, and where it's going on there's the visual cue of the ~. Hopefully in our web/GUI code viewers/editors, you'll be able to hover over the ~ and see the input and output type (and even the conversion term that's been generated).
As long as we are conservative with what goes in .base.conversions, people won't be exposed to over-magic conversions dreamed up by other authors - they can stick to ones they know and trust (and conversely people can go wild with magic conversions in the privacy of their own ucm sessions, if they want to).
The code search/generation procedure can be changed without breaking code.
There is a convenient but reasonable lingua franca around ~ in unison code, defined by .base.conversions.
There is no compiler performance or error message obfuscation penalty for this feature, except at subterms that have the metadata flag. Even there, the compiler performance cost is only paid when doing view/edit. (There's a note about error messages in the next section.)

Possible issues/concerns/questions

Another weird character appearing in terms, ~, to add to ! and '.
- (clearly syntax and terminology are all up for grabs, the ~ character and the 'tilde' and 'conversion' wording is just an initial suggestion)
Error message obfuscation where ~ is in use. I guess a good starting point if the conversion search fails would be to go back to square 1 and just log the source and target types, like with any type mismatch. But really we want good error messages around 'why didn't I find a conversion for you', which might be tricky.
This could be our first bit of user-specific 'how do I want my code to look' config. But I think we're expecting to get config like that, e.g. around preferred names when there are aliases.
The search algorithm will be complicated, what if it doesn't terminate, what if it's unpredictable, etc. See 'conversion search' below.
When typechecking code with '~', are we learning information that refines types and needs to feed back into typechecking? How should that work if there are multiple ~? Can progress on one unstick another? (idris's compilation process can handle this sort of thing via its elaboration monad. I don't know about unison's typechecking though.)
There's a discontinuity between when Unison's doing a view and says 'yup, the codebase matches what I get when I do a fresh run of ~ search', and when it says 'nope, there's a discrepancy, I'm just going to fire the raw term at them to be on the safe side'. For example, suppose someone's removed showTextConversion from their conversion set. Then for code that resolves a Show [Text] they'll see something like showTextConversion (showListConversion (bar)) - they're suddenly faced with showListConversion as well, even though this might well still have been in their conversion set. You could imagine trying to mitigate this by trying more ~ searches in different places before outputting the code, but that might be slow or overcomplicated if taken too far.

Use cases and rationale

Use cases...

Making abilities a user-friendly alternative to typeclasses, by writing the formulaic handling code for you.
Default values, e.g. defaultUserConfig = ~() : WidgetConfig, where WidgetConfig is constructed in turn from a bunch of defaults. I'm not claiming this would be great design, but it's the kind of thing people want to do, and with this proposal they can do it without harm to others...
Lifting to wrapper types, e.g. converting Text -> ColorText

Rationale:

Make abilities work as Unison's take on typeclasses.
Type-driven code generation is a good thing. https://twitter.com/pigworker/status/35132947946274816
The engine of what I'm proposing, the constraint solver / code search / theorem prover thing, is a long-standing part of Haskell since it got typeclasses.
Scala implicit conversions filled a real need. But (a) the fact conversions were happening was invisible at the conversion site, and (b) your hare-brained conversions show up when I'm trying to code. Both issues fixed in this proposal, thanks to Unison's code-is-not-text approach (plus the ~, and the per-user conversion config).

Conversion search

I'd propose that this mechanism

will go ahead and try to find chains of conversions (because we like composability)
can instantiate type parameters
does not let any arbitrary choices in its implementation become visible to the user (e.g. no unjustified dependence on the order it tries different strategies)
takes care to avoid cycles, or ever-expanding types
when it has a choice between A -> B and A -> C, throws a failure.

Suppose your tilde conversion set contains the following.

p : U -> V
q : V -> W
r : B a -> C a
s : C a -> H a -> D a
t : B a -> H a

Then it will

find a U -> W for you: u -> q (p u)
find a B a -> D a for you: b -> s (r b) (t b) - note that this created a sub-goal, to find an H a in order to call s.

Note that the search probably has to play about with ! and ', as it did in the foo example above.

There's clearly precedent here from Haskell constraint-solving, and Scala implicit search. I'm sure there's a bunch of literature... I haven't had a look though. Probably this is a hard but well-studied problem with some off-the-peg answers?

The addition of effect typing is a fun extra element though.

I guess ideally one would have a proofs that the search terminates, produces a unique canonical result (of the correct type), and succeeds if it's possible to do so - and know what conditions those things impose on the tilde conversion set. I would propose not sweating this too much though, at least not in a formal way.

Maybe if we're initially focussing this on the typeclass use case, then we can start of with implementing a very simplified search where we just try and find things to eliminate the E in A ->{E} B.

Possible extensions

~(a, b, c) to fire in a tuple of starting points to the search. Maybe c is the main thing being converted, and a and b are things the search will need along the way (e.g. clues about how to handle some abilities). [edit - actually this comes for free if you put the tuple projection functions into the tilde conversion set]
In a world in which Unison supports code generation via metaprogramming, allow things in the conversion set like %deriveGeneric : Type a -> Term (Generic a).
Allow use statements (or similar) in the code to add things to the conversion set, just while we're in that lexical scope. This would need those statements to be saved in the codebase. Maybe this is equivalent to the ~(a, b, c) idea.
Allow the code search to rove more wildly over the codebase, wiring together map, flip, ap and whatever for you. Joomy Korkut's idris library hezarfen (thesis) is an example of what's possible here at least in a dependently typed setting. It uses idris's typechecker plugin architecture (elaborator reflection) as well as a bunch of theorem prover tech... The UI is editor-based - a magic keystroke to write the code before your eyes, and no attempt to elide the result when viewing later.

Alternative approaches

For typeclasses? Who knows? Some discussion in #502. If we do it via abilities, then I don't think asking the user to write their own handle expressions to discharge typeclass constraints is going to fly - too much pain to do something 'it should be able to do for me'.
Do the code generation as part of editing, i.e. you hit a keystroke and the term appears in front of you. Arguably less magic since you're forced to read the code you're going to add. But if the code was formulaic and information-free (given the types), then it's still noisy to have to look at it. This approach makes more sense when you're doing a search in a large space of possibilities, and just because you find a term of the correct type doesn't mean it's what the user wanted.
Not skin this as a general-purpose conversions system, but instead say it's just for discharging ability constraints without having to hand-write handle expressions. Doesn't seem like the right approach to me, but probably there's an argument that otherwise this approach is 'too big a hammer'.

atacratic commented 5 years ago

Note that there's now an accompanying discussion of the application to the typeclass use case in issue 502 here.

Another issue: how does tilde conversion interact with type inference? If you leave off the signature from foo in the original example:

foo = 'let 
        a = "Alice"
        ~(hello a)

then it's going to have a hard time working out whether you meant '{IO} () or '{IO, Show Text} (). Generally ~ introduces a degree of freedom which I'm not sure how the typechecking algorithm can cope with. Maybe if the types aren't constrained enough to pin down the A and B in ~ : A -> B, then it should throw an error asking you to add a signature or annotation. Would that be a distressing weakening of our inference story?

aryairani commented 5 years ago

Nice write-up; I didn’t notice it until your recent comment. I had a couple thoughts:

when it has a choice between A -> B and A -> C, throws a failure.

Did you mean, when it has a choice between A -> X -> B and A -> Y -> B?

Second, it would be great if you could implement this in pure Unison using an as-of-yet nonexistent API/support for ucm extensions.

atacratic commented 5 years ago

We discussed on slack: you've spotted that the example is broken, because it doesn't follow the rule you quoted. Specifically, when trying to find a B a -> D a, it should give up due to having both r and t to choose from. Not sure if the rule is wrong or just the example, but it seemed to both of us this was just a superficial glitch, and that the idea can be made to work.

And yes being able to implement this through some kind of metaprogramming / language plugin kind of API would be great!

anovstrup commented 4 years ago

@atacratic This is a neat idea!

In the near-term, it does seem prudent to focus on a simpler subset of this capability, limited to discharging abilities. "Conversions" could be limited to those of the form Request a t -> r, and ~ expr could be rewritten as handle handle ... handle expr with h_n ... with h_2 with h_1, where the h's are all the applicable handlers for expr. I think this would be a lot simpler on the search, type inference, and type checking front.

I also wonder whether, even for the general case, it would be better to have a conversion list (or partially ordered set)? For handlers, the order they're applied can be significant, and the type signature alone won't give enough information to choose the order.

On a syntactic note, it would be nice if you could write ~ f a b c rather than ~(f a b c).

unisonweb / unison