probcomp / Venturecxx

Primary implementation of the Venture probabilistic programming system
http://probcomp.csail.mit.edu/venture/
GNU General Public License v3.0
29 stars 6 forks source link

Propose ${} as syntactic sugar for unquote in VentureScript #234

Closed axch closed 8 years ago

axch commented 8 years ago

Is there a convention for quote and quasiquote?

lenaqr commented 8 years ago

I think I proposed backticks in the past and @riastradh-probcomp expressed distaste for non-nestable delimiters.

If we want to borrow from other languages, apparently Ruby has this strange notation: https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Literals#The_.25_Notation which might suggest %q{expr} -> quote(expr) and %qq{expr} -> quasiquote(expr). I feel like this basically amounts to defining shorter names for quote and quasiquote though.

Another idea, I'm not sure how well this would play with the VentureScript grammar, but just use single quote and backtick as they are used in Scheme, and have that swallow the smallest complete next expression that follows, using parentheses when necessary to force a particular interpretation.

A third idea: '[...]/[...]` or `'{...}`/{...}` (bleh, getting that to render right in github was perhaps an object lesson against non-nestable delimiters)

axch commented 8 years ago

I believe Haskell, the other syntax-challenged language I know that has this feature for structures rather than just strings, uses [|...|] for quasiquote and doesn't have non-quasi quote. Standard unquote is either $(expr) or $identifier. (Apparently they also have [name-of-parser-function|...|] for introducing what in Scheme would be thought of as reader extensions.)

axch commented 8 years ago

We could also define ', ,, and `` as prefix operators with some particular precedence other than "tightest". In particular, it might make sense to have them bind less tightly than function application, and possibly also than arithmetic. If they bind loosely, though, the way to force scope becomes to put parens around the quote on the outside, as(..... (, ..... ) .... ), which would look quite odd to a Schemer.

Also, using comma for unquote would be somewhat non-traditional in a syntax that's supposed to be Javascript-like.

vkmvkmvkmvkm commented 8 years ago

I like the Haskell solution, and I also think it's important long term for us to embrace reader extensions now (and demo them, which we should talk about).

vkmvkmvkmvkm commented 8 years ago

I think it is optional for the PPAML PI meeting deliverable but high priority otherwise.

axch commented 8 years ago

For the sake of engineering sanity, I hereby declare that user-authored reader extensions can wait until #80 is done.

riastradh-probcomp commented 8 years ago

By the way, E has expression quasiquotation in an infix, Algol-style syntax:

http://www.erights.org/elang/grammar/quasi-overview.html

marcoct commented 8 years ago

This is very important for publishable VentureScript code. The unquote keyword in the code is distracting.

I like the solution that bears resemblance to bash's $(). Julia also uses this notation for string interpolation.

This is relevant to the NIPS May 20 deadline.

marcoct commented 8 years ago

Actually, upon second thought, it is difficult to expunge the quasiquote from user code in my framework. I think a solution for both quasiquote and unquote is needed.

Personally, I like the following solution too:

  1. [|..|] for quasiquote
  2. $(..) for unquote.
marcoct commented 8 years ago

There is a related of programmatically forming symbols for use in the modeling environment. For example, in my current framework the user specifies a model program using a block of assumes, and a definition for expressions which will be observed:

// USER CODE (make_model bundles together these into a dict)
define x_coords = array(-2, -1, 0, 1, 2);                                                                                     
define model_program = make_model(                                                                                            
  // assumes                                                                                                                  
  do(                                                                                                                         
    assume(a, normal(0, 2)),                                                                                                  
    assume(b, normal(0, 2)),                                                                                                  
    assume(line, proc(x) { a + b * x} })),                                                                                    

  // observed expressions                                                                                                        
  proc(t) {
    x = lookup(x_coords, t);
    quasiquote(normal(line(unquote(x)), 1))
  },                                                                                                                          

  // number of observations                                                                                                   
  size(x_coords)
);      

Behind the scenes the observed expressions get assumed when necessary:

// NON-USER CODE
define do_assume_observations = proc(model) {
    obs_expressions = lookup(model, "observed_expressions");
    num_observes = lookup(model, "num_observes");
    obs_symbols = proc(t) { make_symbol("obs", t) };
    mapM(
        proc(t) {
            assume(
                unquote(obs_symbols(t)), 
                unquote(obs_expressions(t)))
        },  
        arange(num_observes))
};

Currently I do this using a foreign inference SP make_symbol with the following type signature: [t.SymbolType(), t.NumberType()], t.SymbolType())).

An interesting feature that would simplify the process of programmatically generating such symbols, would be to permit:

    assume(obs_$t, ..)

Any token in a model expression that includes $(..) is interpreted as building up a symbol using evaluations in the inference environment.

With proposed syntactic sugar for quasiquote and unquote - a huge improvement

// USER CODE
...
  // observation model
  proc(t) {                                                                                                                   
    x = lookup(x_coords, t);                                                                                                  
    [| (normal(line($x), 1) |]
  },   
...
// NON-USER CODE
assume(unquote(expressions(t))

With the ability to build up modeling environment symbols using the unquote syntax - the user no longer needs to use quasiquote at all, because they've bypassed the need to pass modeling expressions around:

// USER CODE
...
  // observation model (user code)                                                                                            
  proc(t) {                                                                                                                   
    x = lookup(x_coords, t);                                                                                                  
    assume(obs_$t, normal(line($x), 1))
  },

This may be a special case, but seems related to this ticket. A similar feature for building observation labels programmatically using $ would be useful. However, I'm not sure where the boundaries between quasiquote/unquote and string interpolation lie.

axch commented 8 years ago

What syntactic sugar, if any, do we want to add to VentureScript for non-quasi quote? We are rapidly running out of nestable delimiters: round brackets are for function application, curlies are for code blocks, square brackets are conventional array literal syntax, we are already proposing Oxford brackets ([| |]) for quasiquotation.

Why does VentureScript even need non-quasi quote? So that quasiquoted expressions can effectively emit list and array literals:

[| lookup(quote(unquote(map(f, lst))), 1) |]  // expands to lookup(quote(1(2, 3)), 1)

The quote is necessary there because an unquoted list would be interpreted as expression structure and evaluated: lookup(1(2, 3), 1), which is no good.

In explicitly parenthesized languages, this is not a serious problem: since all expressions are guaranteed to nest anyway, the single tokens backtick, quote, comma solve this problem. Since VentureScript is implicitly parenthesized, though, we need nestable delimiters for all these things.

None of the infix languages I have available as models have syntactic sugar for non-quasi quotation, because their abstract syntax trees are not lists, so they do not expect users to have this problem very much.

I don't want to use nested quasiquotation for this, because the standard semantics for quasiquote nested inside quasiquote is to require one more level of unquotation to actually evaluate, so that quasiquote may be used to programmatically construct quasiquotations.

Proposal: Do not add syntactic sugar for quote as such. In cases that do not involve nested unquotation, can use quasiquote instead. In cases that do, cover common scenarios by adding syntactic sugar for the quote(unquote(...)) construct. Concretely:

This would produce, for the motivating case,

[| lookup(${map(f, lst)}, 2) |] // lookup(quote(1(2, 3)), 2)

Alternatives considered:

[| lookup([L| ${map(f, lst)} |], 2) |] // lookup(quote(1(2, 3)), 2)
[| lookup($L{map(f, lst)}, 2) |] // lookup(quote(1(2, 3)), 2)

The current propsal is favored due to optimizing for the case expected to be common for relatively novice Venture users, namely interpolating computed data literals (which should be quoted) into programmatically constructed expressions (which are to be evaluated in the model).