webyrd / Barliman

Prototype smart text editor
MIT License
1.04k stars 30 forks source link

Barliman

Barliman 4.3 is joint work with Greg Rosenblatt.

A new version of Barliman is under development, with collaborators in alphabetical order: Michael Ballantyne, Will Byrd, Greg Rosenblatt, Kanae Tsushima (対馬かなえ), Rob Zinkov. Coming soon! :)


He thinks less than he talks, and slower; yet he can see through a brick wall in time (as they say in Bree).

– Gandalf the Grey, about Barliman Butterbur.

From The Lord of the Rings: The Fellowship of the Ring by J.R.R. Tolkien


Barliman News

4 December 2016


Barliman overview

Barliman is a prototype "smart editor" that performs real-time program synthesis to try to make the programmer's life a little easier. Barliman has several unusual features:

Barliman is general enough to handle multiple programming languages. In fact, the user can even specify their own programming language, or change the semantics or syntax of one of the default languages that come with Barliman. The default language for Barliman is a small but Turing-complete subset of side-effect-free Scheme that supports recursion, list operations, higher-order functions, multi-argument and variadic functions, and a few other features.

Purpose of Barliman

Barliman in action

synthesizing append from scratch

Here are a few screenshots of Barliman, using the Mac implementation as of June 4, 2016:

Update: I've added a few newer screenshots from June 16, 2016. Once Barliman stops changing as rapidly I'll update all the screenshots.

Update – 10 October 2016: Please see the interesting_examples directory for more recent examples and screenshots.

Update – 4 December 2016: The Clojure/conj 2016 talk is currently the best source of up-to-date information on Barliman.

The following descriptions are based on an older and far less capable version. TODO: update them.

The first screenshot shows the main editor window. The Scheme Definition edit pane contains the complete (fully instantiated) and correct definition of append, the list concatenation function in Barliman's default "miniScheme" language. append will be our simple running example in these screenshots. The edit window also contains three tests; each test contains an input expression, and the expected value of that expression. The Best Guess pane, which is not editable by the user, contains the same fully instantiated definition of append as in the Scheme Definition edit pane.

All the text in the editor window is black, indicating that all the information in the editor is consistent and valid. The definition of append is a valid symbolic expression (s-expression), and is indeed a syntactically valid miniScheme definition. The test expressions and expected values are syntactically valid, and consistent with each other.

The editor window displayed in this first screeenshot is similar in spirit to a modern integrated development environment (IDE) that runs tests whenever the code to be tested is modified.

Let's see how we might have gotten to the final version of append using Barliman.

Screenshot 1: Fully instantiated definition of append:

append example 1 -- fully instantiated code

Screenshot 2 shows the empty main editor window, immediately after starting Barliman. We know we want to define append, so in true test-drived development style we begin by writing our tests cases.

Screenshot 2:

append example 2 --

Screenshot 3 shows the main editor window with our three tests. The first test says that if we append the empty list to the empty list, we should get back the empty list. You should be able to figure out the other two tests.

The text for all three tests are red, indicating that none of the tests pass. This isn't surprising, perhaps, since we haven't started to define the append function.

(From an interface design standpoint, whether to use colors, which colors to use, which text fields to hilight, etc., are all open questions in my mind. Over time I hope to make it much more clear exactly which part of the code is failing, and why.)

Screenshot 3:

append example 3 --

Screenshot 4 shows the main editor window after we have begun defining append in the Scheme Definition edit pane. Our parentheses are not balanced – we haven't yet typed a closing parenthesis for the define form. Because of the missing parenthesis, the definition is not a legal Scheme s-expression. The tests cannot pass, of course, since append isn't even an s-expression. Barliman recognizes this, and turns the text in the Scheme Definition edit pane, and the text in the test edit fields, a sickly green color.

(Future versions of Barliman should include a structured editor that will automatically insert balanced parentheses.)

Screenshot 4:

append example 4 --

Screenshot 5 shows the main editor window after we have added the closing parenthesis in our partial definition of append in the Scheme Definition edit pane. The partial definition of append is now a legal s-expression. However, the definition of append is not syntactically valid according to the rules of miniScheme. Of course, this invalid definition of append causes all the tests to fail as well. Barliman recognizes this, and turns the text in the Scheme Definition edit pane, and the text in the test edit fields, red.

(Currently Barliman doesn't actually check that definitions are grammatically correct. Rather, Barliman uses evaluation of the tests to check whether code is semantically legal, rather than syntactically legal. Future versions of Barliman will probably include explicit grammars that are checked, in addition to semantic rules.) (Update: Barliman now includes a relational parser for the miniScheme language, as is shown in the newer screenshots at the end.)

Screenshot 5:

append example 5 --

In screenshot 6 we can see that the programmer has partially specified the defintion of append. The definition is a syntactally-correct s-expression, and indeed is a syntactically correct use of miniScheme's define form. Importantly, the definition of append is only partially specified, and contains four (logic) variables (A, B, C, and D) representing unknown subexpressions. In Barliman variables representing unknown subexpressions are single-letter upper-case variables A through Z. (Note to Schemers: The comma (,) that usually occurs before these letters is necessary because the code in the Scheme Definition edit pane is implicitly quasiqoted.)

Given the partially-specified defintion of append in the Scheme Definition edit pane, along with the three tests, Barliman is able to correctly "guess" the code corresponding to these variables. The correct and complete definition of append is displayed in the Best Guess pane. Barliman guesses the correct code in this case in a second or less. All of the text in the main editor window is black, indicating that all of the code is syntactically correct, and that all three tests pass given the completed definition of append shown in the Best Guess pane.

Screenshot 6:

append example 6 -- partially instantiated code filled in

Screenshot 7 shows an incorrect partial definition of append. As in the previous screenshot, the partial definition of append contains variables representing unknown subexpressions (the A and B and C). However, in this case the first argument to cons is incorrect. The first argument to cons should be (car l), as shown in screenshot 1. Alternatively, the first argument to cons could be an incomplete expression containing a variable representating an unknown subexpression, such as (car ,B) from screenshot 6, provided that this incomplete expression is consistent with the expression (car l). Here, however, the first argument to cons is the expression (cdr l). The red text for tests 2 and 3 indicate that these tests are incompatible with the partial definition of append in the Scheme Definition edit pane. That is, there are no legal miniScheme expressions that could be substituted for the variables A, B, and C that would make tests 2 and 3 pass.

The spinning progress indicator to the upper-right of the Best Guess pane indicates that Barliman is trying to find expressions for variables A, B, and C that will make all of the tests pass. Of course this is impossible – Barliman should be a little smarter, and cut off the Best Guess computation when one of the individual tests fails.

The important thing about this example is that Barliman was able to prove that the partial definition of append is incorrect, without append being fully defined. More precisely, the partial definition of append is inconsistent with respect to tests 1 through 3 and the semantics of miniScheme (which can be edited by the programmer).

Screenshot 7:

append example 7 -- partially instantiated code incompatible with tests

Screenshot 8 shows another partially-instantiated, but incorrect, definition of append. The base case of append should be s instead of l, yet all the text is in black, indicating that the individual tests are compatible with the definition so far. The problem is that we don't have a test that exposes that this partial definition is wrong. We'll fix this in the next screenshot.

This is one danger of using tests for feedback, of course – in general, no finite number of tests is sufficient to prove our definition is correct. I hope that future versions of Barliman will include other ways to specify the behavior of programs, which might include specifying program properties, or providing a "reference" implementation of a function that is being redefined to perform faster, etc.

Screenshot 8:

append example 8 -- partially instantiated code missing a test

In screenshot 9 we add a new test, test 4, that shows that the base case of append is incorrect. Sure enough, test 4's text immediately turns red, indicating it is incompatible with our partial definition.

Screenshot 9:

append example 9 -- partially instantiated code with the missing test

Screenshot 10 shows a limitation of Barliman's program synthesis. Here the partially-specified definition of append contains only a single variable, A, representing an unknown subexpression. Ideally Barliman would quickly figure out that A should be the expression (cdr l). However, for this example Barliman seems to get "stuck" – we can see the spinning progress indicators to the upper-right of the Best Guess pane and the Test 2 and Test 3 edit fields, indicating that Barliman is still "thinking". I let Barliman run for a minute or two, but it didn't find a value for A in that time. (Adding the notion of "parsimony" to Barliman, so it tries to generate the smallest terms first, might help with this example.)

We could allow Barliman to keep thinking – perhaps it would find the answer in five minutes, or in an hour (provided our computer has enough RAM!). However, in practice we would probably try filling in A manually. If we were to type (cdr ,B) in place of ,A, Barliman would immedialy guess in the correct, trivial subexpression l for the variable B.

This example shows how program synthesis in Barliman can be much slower than we might hope in certain cases. However, since Barliman is a text editor, and since multicore computers with lots of RAM are now ubiquitous, I see these examples from a "glass half full" perspective. Sometimes Barliman can help you, either by guessing the rest of your incomplete definition, or by proving that there is no completion for your partially-specified definition that is consistent with your tests. In this case you win. Sometimes Barliman can't help you, in which case you use it like a regular text editor. In this case you use more CPU cyles and RAM on your machine, but otherwise edit text normally.

Of course, Barliman isn't currently a particularly good text editor, especially compared to Emacs with paredit mode, to take one example. This problem is only a matter of engineering – in fact, Barliman-like functionality could be added to Emacs, or to another editor. Or Barliman could get more sophisticated editing abilities.

A bigger drawback is that the semantics for the language you are writing in must be specified in miniKanren. This is fine if you are writing in a minimal Scheme subset, such as miniScheme. This isn't so great if you want to program in full Clojure or Racket or Javascript or Ruby. Finding ways to scale this technology is an open problem. The solution may not be miniKanren or constraint logic programming, but rather another synthesis approach. I don't know. I do hope, however, that Barliman will make people think about how synthesis capabilities can be integrated into editors, especially for dynamic languages.

Screenshot 10:

append example 10 --


Update: Here are a few newer screenshots, as of June 16, 2016, that show off the relational parser for miniScheme that I added a couple of days ago.

Screenshot 11 is an updated version of screenshot 5, showing the new relational parser at work. The definition of append is syntactically incorrect, which is represented by the purple text in the Scheme Definition edit pane. The tests are syntactically correct, but fail, and are therefore shown in red text.

Screenshot 11:

append example 11 --

Screenshot 12, like screenshot 11, shows a syntactically incorrect definition of append. In this case the keyword lambda appears as the body of a lambda expression. In miniScheme, as in Scheme, lambda is a special form rather than a function; the keyword lambda cannot appear "by itself". Hence the purple text in the Scheme Definition edit pane.

Once again, the tests are syntactically valid, but fail, and so are shown in red text.

Screenshot 12:

append example 12 --

Screenshot 13 is identical to screenshot 12, except that the formal parameter to the lambda expression in the Scheme Definition edit pane has been changed from x to lambda. This formal parameter shadows the lambda keyword, allowing lambda to appear by itself in the body of the lambda expression.

Once again, the tests are syntactically valid, but fail, and so are shown in red text.

This example shows that the relational parser keeps track of the environment, variable scope, and shadowing.

Screenshot 13:

append example 13 --

Screenshot 14 shows a syntactically legal partial definition of append in the Scheme Definition edit pane. Three of the tests are syntactically legal, and are (individually) consistent with the partial definition of append; therefore, the text for these tests are shown in black.

The third test, however, is syntactically incorrect. This is because in miniScheme, as in Scheme, and is a special form rather than a function, and therefore cannot be passed into the call to append. Since the third test is syntactically illegal, it is shown in purple text.

Screenshot 14:

append example 14 --


Running Barliman from Source

To build and run Barliman on Mac, open Barliman/cocoa/Barliman.xcodeproj in Xcode, and click the run button.


Advantages and limitations of Barliman

Advantages of Barliman

Barliman is flexible. Barliman can handle operational semantics for various programming languges. Users can add their own semantics, or modify the semantics for languages that are included with Barliman. Barliman does not require the language be statically typed, or that the user has supplied enough tests to fully synthesize the function being defined.

Barliman is interactive. Any change to the definition of a function, the corresponding tests, or even the semantics immediately re-triggers the program synthesis solver.

Limitations of Barliman

Barliman can be extremely slow when it comes to program synthesis, and can easily get "stuck", possibly taking hours and tens of gigabytes of RAM to synthesize small fragments of code. Since the default "miniScheme" language is dynamically typed, Barliman cannot take advantage of types to limit the space of programs to be considered during synthesis. There are other synthesis tools that can synthesize the complete definition of append, for example, given append's type signature along with tests that properly cover the behavior of append. (In fact, Michael Ballantyne has been able to synthesize append by integrating types into a tiny Scheme-like languge, which I'd like to explore in the context of Barliman.)

To me this is a tradeoff. Barliman is very flexible in its handling of languages and synthesis problems. At the same time, Barliman's synthesis is slow, which is why the tool is designed to work interactively with a programmer. I think this is a reasonable tradoff to explore, since there are plenty of dynamically-typed languages in use (Javascript, Python, Ruby, Scheme/Racket/Clojure/Lisp, etc.). Also, Barliman doesn't require that the user specify every test necessary to synthesize the complete definition of the function being considered, which reduces the burden on the programmer.

In short, Barliman is flexible, and can handle Turing-complete dynamically-typed higer-order languages, and under-specified synthesis problems, but the tradeoff is that Barliman's synthesis is slow.

Barliman works best for big-step operational semantics. It is possible to implement small-step semantics in Barliman. However, the synthesis features of Barliman are likely to work poorly compared with semantics written in a big-step style.

Similarly, Barliman works best for side-effect-free languages, such as a pure subset of Scheme. Once again, Barliman can handle languages with side effects, such as variable mutation. However, Barliman's synthesis abilities are likely to suffer as a result.

I do not know how large a language, or how large a definition, Barliman can handle in practice. I will be experimenting with this...

Barliman can be resource hungry. Given six example programs and a definition, Barliman will launch eight instances of Chez Scheme, all running in parallel. Barliman tries to kill these processes when they are not needed, but it is possible for these processes to run for long periods of time (like, forever) and take up unbounded amounts of RAM.

Barliman currently isn't a very good at standard text editing. For example, anyone used to paredit or structured text editing will miss those features in Barliman, at least for now. I do want to add these features to Barliman, especially since I expect they will make the synthesis aspects easier to explore.

Barliman currently doesn't support saving or loading files, definitions, tests, or anything else. I plan to add this feature soon.

Barliman is changing quickly, and definitely contains errors and interface quirks. To the best of my knowledge none of these problems are inherent in the design of Barliman, or the technology being used for synthesis. Still, since this is a rapidly evolving prototype, I expect I will be introducing errors about as quickly as I remove them, at least for a while.


How Barliman works

Barliman uses miniKanren (http://minikanren.org/), and a relational Scheme interpreter written in miniKanren (Byrd, Holk, and Friedman, 2012, http://dl.acm.org/citation.cfm?id=2661105 or http://webyrd.net/quines/quines.pdf), to provide real-time feedback to the code editor using program synthesis.

Chez Scheme in turn uses the miniKanren implementation and relational interpreter implementation contained in the mk-and-rel-interp directory.


The default "miniScheme" language

(give grammar and semantics for the default language)

As in Scheme, in miniScheme duplicate variable names of definitions at the same scoping level, or duplicate lambda or letrec bindings, are illegal. However, Barliman does not currently detect these violations. For example, Barliman will not complain about the expression ((lambda (x x) x) 3 4), the behavior of which is unspecified. Probably the parser should enforce that the variable names are distinct.

The lambda and letrec forms do not contain an implicit begin.

The lambda form supports multiple arguments, (lambda (x y z) y), and a single "variadic" argument, (lambda x x), but currently doesn't support the full Scheme variadic syntax, (lambda (x y . z) x).


Barliman implementation details

The cocoa version of the editor is written in Swift, and has been tested under OS X 10.11.4 and XCode 7.3.1. Eventually the editor will be crossplatform. I'm starting with cocoa since I'm developing on a Mac, and I want to make sure I don't box myself into a corner with the user interface/performance as I experiment with the design and the interface. The cocoa version of Barliman calls out to Chez Scheme (https://github.com/cisco/ChezScheme), which must be installed separately, and which is assumed to reside in /usr/local/bin/scheme.

IMPORTANT: The cocoa version of Barliman does its best to clean up the Scheme processes it launches. However, it is wise to run top -o cpu from the terminal after playing with Barliman, to make sure errant Scheme processes aren't running in the background. If these tasks are running, you can kill them using kill -9 <pid>, where <pid> is the process identifier listed from the top command.

To use the cocoa version of Barliman, first build and launch the application from XCode. The application currently has a single window, with a single editable pane. Enter Scheme expressions that run in the relational Scheme interpreter, such as:

((lambda (x) x) 5)

Each time the text changes, a Chez Scheme process will be launched, attempting to evaluate the expression in the relational Scheme interpreter. If the expression is not a valid Scheme s-expression, Chez will complain, an Baliman will display the code in red. If the expression is a legal s-expression, and the corresponding miniKanren query to the relational interpeter succeeds, the value of the query will be displayed below the editable text box. If the query fails, the empty list will be displayed in the text box.

For more interesting answers, you can use the logic variables A through G, upper-case. Make sure to unquote the logic variables:

((lambda (x) ,A) ,B)

Acknowledgements and thanks

Thanks to Michael Ballantyne, Kenichi Asai, Alan Borning, Nada Amin, Guannan Wei, Pierce Darragh, Alex Warth, Michael Adams, Tim Johnson, Evan Czaplicki, Stephanie Weirich, Molly Feldman, Joe Osborn, Nehal Patel, Andrea Magnorsky, Reid McKenzie, Emina Torlak, Chas Emerick, Martin Clausen, Devon Zuegel, Daniel Selifonov, Greg Rosenblatt, Michael Nielsen, David Kahn, Brian Mastenbrook, Orchid Hybrid, Rob Zinkov, Margaret Staples, Ziyao Wei, Matt Hammer, Hunter Hutchinson, Bryan Joseph, Cesar Marinho, Michael Bernstein, Bodil Stokke, Dan Friedman, Ron Garcia, Rich Hickey, Phil Wadler, Tom Gilray, Dakota Fisher, Gavin Whelan, Devon Zeugel, Jonas Kölker, Matt Might, participants of my 2016 PEPM tutorial on miniKanren, and particants of the 'As We May Thunk' group (http://webyrd.net/thunk.html), for suggestions, encouragement, and inspiration.

Thanks to Kent Dybvig, Andy Keep, and Cisco Systems for releasing Chez Scheme under an open source license.

The definition of letrec in the main interpreter is based based on Dan Friedman's code, using the "half-closure" approach from Reynold's definitional interpreters.

Greg Rosenblatt has been improving the search and the miniScheme interpreter to improve synthesis performance, greatly improving performance on many of the synthesis problems.

Barliman is intended to be an improved version of the very crude 'miniKanren playground' I showed at my 2016 PEPM tutorial on miniKanren: https://github.com/webyrd/minikanren-playground


Barliman TODOs and limitations

TODO:

LONGER TERM:

POSSIBLE USE CASES:

You write tests and code. The tests pass. Later you find an error in the code, so you go back and add more tests, which fail.

Click a Barliman 'auto-repair' button. Barliman tries, in parallel, removing each subexpression and trying synthesis to fill in the rest.

If Barliman could use a Amazon server with dozens of hardware cores and 2TB RAM (like the new X1 server on AWS), this really could be done in parallel.

Or run locally until there's a timeout, then run again with the holes in other places. Could even try pairs of holes to keep the synthesis problem as small as possible.

Or, perhaps more practical short term until Barliman's synthesis improves...

Have Barliman try removing each subexpression and then check if any of the tests still fail. Then hilight these known bad subexpressions to help guide the user.

Greg Rosenblatt's suggestion for auto-repair: "The user may also want to mark some regions of the code as suspect, which would prioritize the area searched for problematic sub-expressions. If the user is right, the fix could be found much sooner."

SUSPECT IDEAS:

INTERESTING IDEAS:

KNOWN LIMITATIONS:

KNOWN ERRORS:

DONE (features on the TODO list implemented since the original release of Barliman)