unisonweb / unison

A friendly programming language from the future
https://unison-lang.org
Other
5.81k stars 271 forks source link

A Unison API for managing a Unison codebase #922

Open runarorama opened 5 years ago

runarorama commented 5 years ago

Proposal draft

The Unison Codebase Manager, ucm is a command-line tool for exploring, manipulating, and organizing a Unison codebase. But it would be so much nicer if we could do all of those things in Unison.

We already have the Link.Term and Link.Type types from #901 to support documentation. These are first-class references (i.e. hashes) to Unison terms and types in the codebase, respectively.

Motivating example: refactor a type

Say we have an ability type with a lot of constructors:

ability MyAPI where
  one : One
  two : Two
  three : Three
  …

Let’s say it has 50 constructors — an API with a large surface. And we want to change one of the constructors, let’s say three to produce Nat instead of Three.

The ucm workflow would be to edit the MyAPI type and make the small change, then update to construct a patch, and then adding to the patch a mapping from the constructors of the old MyAPI to the constructors of the new MyAPI. We’d have to issue a resolve.term command for every constructor. Doing this 50 times is repetitive and boring.

But we have a programming language! So why not just write code to do the repetitive and boring task for us?

What we’d like to do is write a Unison program that does something like the following:

A Codebase ability

To support this kind of thing, we need some operations on the codebase.

For example, we want to get a list of constructors of a type:

Codebase.constructorsOf : Link.Type -> {Codebase} [Link.Term]

And get the names of a term or type:

Codebase.termNamesAt 
  : Namespace -> Link.Term ->{Codebase} [Name]
Codebase.typeNamesAt 
  : Namespace -> Link.Type ->{Codebase} [Name]

The Namespace argument is necessary here since names are different depending on the namespace. You could supply the root namespace . to get all the names globally for a term or type.

It might be useful to ask the codebase for the contents of a namespace:

Codebase.list : Path ->{Codebase} [Link]

Where Link refers to a term, a type, a patch, or another namespace:

type Link = Term Link.Term
          | Type Link.Type
          | Patch Patch
          | Namespace Namespace

We also want to be able to retrieve and store patches:

Codebase.getPatch : Name ->{Codebase} Patch
Codebase.putPatch : Name -> Patch ->{Codebase} ()

What’s a patch?

We’ll need Patch to support basic operations that allow us to replace or deprecate terms and types:

Patch.replaceTerm : Link.Term -> Link.Term -> Patch -> Patch
Patch.deprecateTerm : Link.Term -> Patch -> Patch
Patch.deprecateType : Link.Type -> Patch -> Patch
Patch.replaceType : Link.Type -> Link.Type -> Patch -> Patch

We also want to be able to combine patches, and we’ll need the empty patch:

Patch.empty : Patch
Patch.union : Patch -> Patch -> Patch

It would be useful to be able to ask a patch for its contents:

Patch.termReplacements : Patch -> Map Link.Term Link.Term
Patch.typeReplacements : Patch -> Map Link.Type Link.Type

And of course the whole point of a patch is to apply it to the codebase at a particular namespace:

Codebase.applyPatch : Namespace -> Patch ->{Codebase} ()

Putting it all together

Using just these operations, we can perform the refactoring on our MyAPI type.

constructorMap 
  : Namespace 
  -> Link.Type
  ->{Codebase} Map Name Link.Term
constructorMap p typ = 
  go ctor map = 
    termNames = Codebase.termNamesAt p ctor
    foldr (go' ctor) map termNames
  go' ctor termName map = 
    Map.insert termName ctor map
  foldr go Map.empty (Codebase.constructorsOf typ)

nameBasedUpgrade
  : Namespace 
  -> Link.Type
  -> Link.Type
  ->{Codebase} Patch 
nameBasedUpgrade oldType newType
  newCtors = 
    constructorMap Codebase.constructorsOf newType
  oldCtors = 
    constructorMap Codebase.constructorsOf oldType
  go ctor patch =
    case ctor of (name, link) -> 
      Patch.replaceTerm link
                        (Map.lookup name oldCtors)
                        patch
  foldr go Patch.empty (Map.toList newCtors)

upgradeMyAPI = 
  nameBasedUpgrade (typeLink MyAPI#oldHash)
                   (typeLink MyAPI)

Here the syntax typeLink T is built-in Unison syntax for getting a Link.Type. The type MyAPI#oldHash is not valid syntax but is supposed to represent whatever the actual hash-qualified name of the old version of MyAPI would be.

One thing to note is that constructorMap and nameBasedUpgrade are totally reusable general-purpose functions that can be used to perform this kind of migration on any type. There are probably lots of other useful general-purpose refactorings we could write using this kind of API.

Future expansion

The Codebase API could be expanded later to allow more operations on the codebase:

Note that this API doesn’t include any internal representation of Unison terms and types. We can ask the codebase for links, but not for the actual terms or types. If we had such metaprogramming facilities, this API could be expanded to allow editing the actual code rather than just the codebase structure.

pchiusano commented 5 years ago

Overall looking pretty good. Some thoughts below -

I feel like there could be a less dry-sounding title and a more exciting intro. Even though the motivating use case isn't terribly exciting, the path this is on (of not needing to do tedious text manipulation to make structured changes to a codebase), is a big deal which will bring huge quality of life improvements. I'm not sure the best way to capture this, but I feel like a couple more paragraphs at the start could do it...

... once you get into the actual API, that's an appropriate amount of detail but most people aren't going to read it in detail.

We already have the Link.Term and Link.Type types from #901 to support documentation. These are first-class references (i.e. hashes) to Unison terms and types in the codebase, respectively.

This reads as a non-sequitor, not clear how it relates to what follows. How about introduce this "just in time", right at the point where you first use those types in the proposed API?

The ucm workflow would be to edit the MyAPI type and make the small change, then update to construct a patch, and then adding to the patch a mapping from the constructors of the old MyAPI to the constructors of the new MyAPI. We’d have to issue a resolve.term command for every constructor. Doing this 50 times is repetitive and boring.

Might also be good to include a few tantalizing ideas at the end for other (currently tedious) workflows that might be automated via an API like this.

I think not too many people will follow this explanation so either signpost that or just omit it. I think you could just say something like "refactoring this type with Unison today is rather painful (see <link> if you are curious to know more about the current recommended workflow), but what we'd like to do is straighforward if we have a way of programmatically manipulating a Unison codebase:" And then give those 3 steps.

One observation that I think is interesting that might be which you might want to call out is that the algorithm being used for the upgrade is one that we don't really think about - it is sort of implicitly what is happening when you modify a codebase by mutating text files, but here we are actually taking a step back and coming up with a very explicit algorithm to implement the codebase transformation we want, and then writing regular code to do it. That is a big shift in perspective I think since we are accustomed to evolving a codebase in a very first-order, manual way.

atacratic commented 5 years ago

So why not just write code to do the repetitive and boring task for us?

As I was reading the first 75% of this article, I was thinking "surely you're not expecting me to write a metaprogram every time I want to do the most basic type upgrade?" I think probably you're not, but that's only based on inference from the nameBasedUpgrade code snippet.

I want the default behaviour when I slurp my updated type to be Unison proposing to construct a patch using the 'same name => same constructor' heuristic. I think there should be some reassurance early in the post that that will be the case, otherwise many readers will come away with the impression 'hey this structured codebase thing is actually going to be a pain to use after all'.

nameBasedUpgrade

Worth typechecking the code before publishing - I spotted that nameBasedUpgrade takes a Namespace according to its signature, but nowhere else.

You don't handle the 'Map.lookup fails' case, maybe that's fine for the example.

And Codebase.list takes a Path but elsewhere you've used Namespace.

Other/future

Maybe out of scope for the post and the currently-planned API but I was wondering about the staging aspects of this. When can you run code that uses Codebase, and when can't you? I guess there is maybe a flavour of run that handles Codebase. Will it let you do IO at the same time? (Hopefully yes because then you get 'type providers'. That might be an exciting future direction to mention.) As you are running code in Codebase, are any of the changes you are making actually taking effect? Do they take effect atomically after your run command is finished? Or is there a primitive to call to say 'end of codebase transaction, now let me run the thing I've just created'. How do automated edits appear in the history?

It will be interesting when ucm has commands that are implemented in terms of Unison Codebase functions. Kind of another flavour of builtin. I wonder how the versioning issues for that will shake out.

anovstrup commented 4 years ago

If this API is designed so that all ucm commands could (at least notionally) be implemented against it (e.g., as built-ins with API-conforming types), then it could also support the capability to customize ucm (with new commands or renamed/aliased commands) from within the codebase. This could accomplish the goals of issues like #809 in a more general way.

The missing piece would be some mechanism for ucm to pick up ucm commands (including user-defined commands) from the codebase. One simple mechanism would be for ucm to automatically recognize all values of a particular built-in type (UnisonCommand) within a particular namespace (e.g., ucm.commands) as commands that can be invoked in ucm. With this mechanism in place, users could rename or alias commands just by moving/aliasing the corresponding terms and could even implement new commands against the API so that they'd behave just like regular commands.

aryairani commented 4 years ago

@anovstrup Maybe the mechanism could be general enough that you could implement the convention you described (parsing user commands according to definitions in a particular namespace) as a Unison function that uses this API ;-)

commandParser : Text ->{Ucm} ()