rfc: extern and call instructions

This PR was jointly authored by engineers from the Rigetti engineering team and I am opening this PR on our collective behalf. We hope the document should largely speak for itself.

We look forward to getting some high-level feedback on the prospects of the document's proposed changes. From there we can address desired modifications to the proposed features and syntactical details.

@stylewarning This is ready for review now.

Hey @erichulburd, this is very nice and well written. I think it's largely good. One of my main questions is one of simplifying things:

Does EXTERN provide much beyond user convenience and error checking possibilities?
What do you think about combining EXTERN and CALL with something like FOREIGN that allows more or less any free-form call to any named entity?

It's hopeless to assign semantics to an extern call anyway, except to say the call happens prior or subsequent other instructions, and we might recognize this fact as a matter of simplification:

DECLARE inputs REAL[3]
DECLARE outputs REAL[3]
EXTERN MY-FUNCTION(REAL[3]) REAL[3]
CALL outputs MY-FUNCTION(inputs)

becomes something like

FOREIGN "MY-FUNCTION" outputs inputs

The grammar would basically be

FOREIGN <procedure-name> (<Memory Reference> | <literal>)*

<procedure-name> = <string>

This would not permit QUILC to check anything, but rather just pass things through and let any backend compiler decide on the validity of instructions. This is similar in spirit to the PRAGMA syntax, but foreign calls would not be able to be eliminated or re-arranged.

Thoughts?

Does EXTERN provide much beyond user convenience and error checking possibilities?

No. It's there for program readability and system / compiler validation of later CALL instructions only.

What do you think about combining EXTERN and CALL with something like FOREIGN that allows more or less any free-form call to any named entity?

It's hopeless to assign semantics to an extern call anyway, except to say the call happens prior or subsequent other instructions, and we might recognize this fact as a matter of simplification:

EXTERN is a C-like declaration that need only be included in the program once (in fact, our proposal makes it illegal to include redundant EXTERN declarations). So subsequent CALL instructions, in effect, are loosely equivalent to the FOREIGN suggestion (with a minor difference in the arrangement of inputs and outputs, which I comment on toward the end of the response).

My personal view is that the readability and validation that EXTERN supports is worth explicit specification mostly because it provides a clear means of a system / backend to communicate to the user what foreign functions are available in the Quil language itself.

Sure, this could be communicated in a payload conforming to some arbitrary schema (gRPC, JSON, etc), but what would be the source of truth for that schema? It seems less worthwhile for Quil to specify such a schema, say in JSON, because that makes assumptions about the message passing of backends. A Quil specification for EXTERN, on the other hand, is guaranteed consistent across backends, so long as they support the Quil specification itself.

Less significantly, I'd point out:

EXTERN provides a clear point to disambiguate overloaded functions. If a backend supports both CHOOSE(options REAL[10]) REAL and CHOOSE(options REAL[]) REAL and the user only includes an EXTERN for the latter, it is clear which implementation to use. This example, of course, presupposes support for overloading.
EXTERN provides a means to alias functions, which provide a focused means to account for different instruction implementations across backends. While, this isn't necessary for instructions like MUL or ADD, it is important for mathematically ambiguous instructions like RAND (maybe this is what you meant by "it's hopeless to assign semantics to an extern call"?). For instance, suppose backend A supports RAND(seed REAL) REAL , but backend B supports LFSR(seed REAL) REAL; both implementations meet the user's requirements. The Quil program need not be re-written in its entirety for each backend. Instead, the user just maintains separate EXTERN statements for each backend. The goal here isn't to support the ability of Quil or backends to strictly assign semantics; the goal is to enable the user to better navigate semantic ambiguity.

All of that said:

I think we could loosen the requirement that any CALL must follow a corresponding EXTERN instruction to should follow.
I don't mind changing CALL to FOREIGN, though I like that CALL is consistent with the conventions of other notable IRs (LLVM, QIR, MLIR, etc). Also I think we want the function identifier to separate inputs or outputs in order to disambiguate conflicting signatures such as MY-FUNCTION(REAL, REAL) REAL and MY-FUNCTION(REAL) (REAL, REAL).

@stylewarning and myself have been giving this some thought.

The spirit of the RFC under consideration seems to me to focus on one simple desire: to support the inclusion of more classical functions into Quil programs. I think this desire can met with greater parsimony than what this RFC requires. I propose a simpler and more focused change that still meets what I take to be the this RFC's value proposition.

Bullet points summarizing the change are:

Dispense with the EXTERN keyword in favor of a PRAGMA for declaring external function types
Simplify the CALL form's syntax in a couple of ways

A `PRAGMA` for Declaring External Functions

This pragma would resemble the RFC's EXTERN, but would simplify argument declaration:

PRAGMA EXTERN identifier type*

For example


PRAGMA EXTERN boundedRandom REAL REAL REAL

This would declare the type signature a function called boundedRandom. It would be up to the implementor to make sense of this function's effect on its three arguments. Presumably, the function would generate a random number bounded by two of those arguments, writing that random number into the third argument.

Rationale

My rationale for preferring PRAGMAs is that their inclusion is optional from the perspective of program semantics. The purpose of a PRAGMA is to inform the compiler about additional information that it may take advantage of during compilation. Excluding a pramga should have no disercnable effect on program semantics. Should users wish to declare the externs they wish to use, then the compiler can check those declared functions at their call sites.

On the Omission of `mut` and Named Parameters

The mut syntax seems superfluous at best. If the purpose of declaring external functions is to allow the compiler to perform simple call site correspondence checks on the arguments to a call, then mut serves no purpose. Quil does not encode the mutability of declared memory in the type of that memory. Perhaps this change ought to follow in a subesquent RFC should its utility be made more obvious.

The omission of named parameters is proposed in order to simplify the software support required by these changes. If users strongly prefer the "self-documenting" quality of named parameters, I am not strongly opposed to re-admitting them.

A simplified `CALL` Keyword Syntax

In lockstep with the syntax of the above PRAGMA, ths syntax of CALL can also be simplified.

For example, calling boundedRandom would appear as follows:


CALL boundedRandom 1.0 10.0 randval

This proposal removes the distinction between "input" and "output", shifting all responsibility for juggling values in classical memory onto the implementor of boundedRandom, for example.

Call correspondence checks would now first check for the presence of an extern pragma whose identifier matches that of the CALL's first argument. If no such pragma is present, then the call's appearnce is assumed to be correct.

Rationale

The effect of any CALL on the program semantics is to, essentially, increment the program counter and to possibly alter the state of classical memory. Distinguishing inputs from outputs in the syntax of the CALL is superfluous to supporting externs and only serves to complicate the syntax: external functions are always free to mutate their inputs, and so there is no good reason to prefer distinct syntax for designation of outputs.

edited for (I hope) clarity

Thanks for the thoughts @macrologist . My responses are inline below.

I think this desire can met with greater parsimony than what this RFC requires.

I don't believe excluding complex issues from the RFC simplifies the design. Rather, it defers address of the complex issues, which has the unfortunate consequence of rendering the RFC ambiguous, creating disagreement between implementations. For instance, WRT boundedRandom REAL REAL REAL, how should implementations come to agreement about the semantic meaning of these arguments? The executed programs are very dependent upon the order of the arguments. Yes, the syntax is lighter, but dodging issue complexity does not stack up to a simpler design when you consider an ecosystem of Quil backends and authors.

The existing Quil spec itself makes clear which memory operands are sources and which are destinations. This RFC is consistent with the existing convention that names destination addresses before source addresses (e.g. ADD a b # a := a + b).

My rationale for preferring PRAGMAs is that their inclusion is optional from the perspective of program semantics.

I think there is some truth to this and I don't take major issue with PRAGMA preceding EXTERN. However, I'm not convinced that's the optimal solution. For one, PRAGMA is already a keyword and its existing syntax confounds what we are trying to accomplish with a structured EXTERN declaration. I think a better route here is for implementations to ignore EXTERN like they would a PRAGMA.

Additionally, consider that a similar argument to the one you make about EXTERN could be made for DECLARE . DECLARE does not amount to any runtime instruction. It is there to help the compiler plan memory layout. Similarly, the EXTERN is there to help the compiler identify, load, or check runtime dependencies. This is a pretty common construct in both programming languages and IRs.

The mut syntax seems superfluous at best. If the purpose of declaring external functions is to allow the compiler to perform simple call site correspondence checks on the arguments to a call, then mut serves no purpose.

The intention here is to aid the compiler (and reader) in program scheduling, rather than perform call site correspondence. Take two consecutive instructions, A and B. If A writes to memory region b and B reads from it, it is necessary that instruction A completes before instruction B reads from b; otherwise, there is no hard memory or timing dependencies between them; they could effectively read separate copies if neither instruction writes to b.

Again, this amounts to "compiler help"; it doesn't affect program semantics. However, that is the case more broadly embraced in the design of EXTERN we've submitted and implementations can be free to ignore the information.

I'd refer you to this note about why declaring structured APIs right within Quil is better than deferring to disparate implementations. If they are to come to an agreement about the structure of a functional call, it's best to keep it in a guaranteed common language rather than add additional dependencies across different systems and programming languages.

Distinguishing inputs from outputs in the syntax of the CALL is superfluous to supporting externs and only serves to complicate the syntax

I'd argue, you've taken away two characters (( and )) and introduced more ambiguity both to the reader and the compiler. Again, this is consistent with the existing Quil spec (which clearly distinguishes between source and destination operands) and the perspective that disparate implementations should have a means of clearly communicating the structure of their APIs.

Edit: If we really want to simplify the call, I would say dropping the input / output distinction (which is really only there to conform to common programming language and IR convention), does not come at the cost of any information expressivity as long as we have named parameters and the ability to annotate them as mut (or inout, or whatever your preference may be).

After discussion here and elsewhere on the changes proposed by the RFC, here is something that I believe we are all in a position to move forward with.

We introduce two keywords EXTERN and CALL, and we introduce a PRAGMA for specifying function signatures of extern functions.

The new syntax could look something like this:

<Extern> := 'EXTERN'  <Identifier> <Integer>?
<Call> := 'CALL' <Identifier> <ExternArgument>*
<ExternArgument> := <Memory Reference> | <Complex>

where <Identifier>, <Integer>, <Memory Reference>, and <Complex> accord with the Quil spec.

The EXTERN keyword specifies a name for the external function as well as an optional arity. The CALL keyword signifes an instruction, and specifies the name of the extern function to call as well as a list of arguments, each of which can be either a memory reference or a number. [Sidebar: Maybe the <Extern Argument> pattern should be expanded.]

The exern function signature PRAGMA might look something like one of these:

PRAGMA some_extern x y z  "(mut Integer, Integer, Integer)"

PRAGMA some_extern "(x : mut Integer, y : Integer, z : Integer)"

Or something else. Assuming the second option is chosen, the syntax of the type string might be:

<TypeString> := '"' s* '(' s* <ExternParameters>? s* ')' s* '"'
<ExternParameters> := <ExternParameter> | <ExternParameter> s* ',' s* <ExternParameters>
<ExternParameter> := <Identifier> s* ':' s* 'mut'? s+ <Identifier>

Another option is augmenting the PRAGMA syntax itself to permit something closer to that extern parameter typ syntax.

This is looking pretty close. A few comments:

<Extern> := 'EXTERN' <Identifier> <Integer>?

The arity is obstructive here if we eventually want to include the <TypeString> signature in the EXTERN instruction (as mentioned on the call by @stylewarning last month). For instance if a Quil program has EXTERN prng 1, the forward compatibility is a bit messy with EXTERN prng "(x : mut Integer)". Of course, a parser could check for the arity first and, failing that, try to parse the signature, but it's cleaner IMO to have nothing.

In the immediate term, we would omit the arity and include the extern function signature pragma as the arity is redundant to the latter.

<ExternArgument> := <Memory Reference> | <Complex>

I would prefer <Identifier> | <Memory Reference> | <Complex> here. Although <Memory Reference> includes <Identifier>, the spec states that is for reference to the single element in a memory region of length 1. Inclusion of <Identifier> explicitly supports the notion of passing an entire memory region to the function.

<TypeString> := '"' s* '(' s* <ExternParameters>? s* ')' s* '"'

This looks good, although I would probably move the EXTERN name inside the signature and identify the pragma with its own <Identifier> to avoid naming collisions. So:

<TypeString> := '"' s* <Identifier> s* '(' s* <ExternParameters>? s* ')' s* '"'

Then "extern" would represent a reserved pragma identifier (could also be "declare-extern"). Your example then becomes:

PRAGMA extern "some_extern (x : mut Integer, y : Integer, z : Integer)"

I would also lobby again for support a single return <Complex>. Something along the lines of:

<TypeString> := '"' s* <Complex>? s* <Identifier> s* '(' s* <ExternParameters>? s* ')' s*'"'

If that leading <Complex> is specified, then the first < ExternArgument> must be <MemoryReference> where the result is written to (first is again in keeping with the existing <destination> <source> pattern in Quil).

We'd also like explicit mention that such functions could be called within <Expression>s. This is a feature that is particularly appealing to those with a stack-based control system. Consider the current <Expression> <Term> ⟨Identifier⟩ ( ⟨Expression⟩ ). This allows for syntax such as RZ(SQRT(theta)) 0 (see Quil.g4). The idea we want to consider here is to support arbitrary function calls within expressions. If, for example, you have a random number generator, and you don't necessarily want to store the random number in memory and then later fetch it, you might have an extern signature EXTERN prng (seed: mut Integer). Now to use this in an operation such as CALL prng(random); RZ(random) 0;, you'll need an implicit store / fetch from memory (unless your compiler can do the necessary analysis to keep, use, and drop a temporary memory value). Now if Quil supported something like REAL prng (seed : Integer), a naive compiler can omit the store / fetch if the program contains RZ(prng(seed)) 0.

<ExternParameter> := <Identifier> s* ':' s* 'mut'? s+ <Identifier>

If we want to make function argument names optional, this would become: (<Identifier> s* ':' s*)? 'mut'? s+ <Identifier>. No strong feelings on this matter, however.
The second <Identifier> should be <Base Type> ("[" , Integer? , "]")? for explicitness.
- If Integer? is omitted, then the argument may be of variable length.
- if ("[" , Integer? , "]") is omitted, then the argument must be an immediate value (i.e. a compile time constant).

Other topics from the RFC not mentioned here:

Both function namespacing and aliasing can be omitted. Those are definitely nice-to-haves and what we have here is forward compatible with adding those in the future.
Because we are not returning values, there isn't support for explicitly dropping values using "_". I think we could address this by marking some arguments as optional. Adding support for such a modifier is forward compatible (e.g. "REAL[10]?" or "OPTIONAL REAL[10]"), so I believe it's fine to leave out for now.
For reasons stated in the RFC, I think discussion of overloading bears mention in the spec (or at least a pointer to the RFC). With the proposed changes, I would modify the RFC requirements for resolving overloaded functions as:

Consider the following relationships between parameter and return type sets:

$BaseType \subset BaseType[1] \subset BaseType[]$
$BaseType[Integer] \subset BaseType[]$

More specific types take precedence over less specific types (ie any type takes precedence over a type of which it is a subset).
Any parameter takes precedence over a parameter to its right.
Declared signatures with equivalent type sets are strictly forbidden. This applies regardless of textual representation. Note, a parameter of type mut <Base Type> is considered equivalent to type <Base Type>.

Summing up, here is the formal counter-proposal:

<Extern> := 'EXTERN'  <Identifier>

<ExternParameter> := (<Identifier> s* ':' s*)? 'mut'? s+ <Base Type> ("[" , Integer? , "]")?
<ExternParameters> := <ExternParameter> | <ExternParameter> s* ',' s* <ExternParameters>
<TypeString> := '"'(s* <Complex>)? s* <Identifier> s* '(' s* <ExternParameters>? s* ')' s*'"

<ExternArgument> := <Identifier> | <Memory Reference> | <Complex>
<Call> := 'CALL' <Identifier> <ExternArgument>*

This looks good to me!

There are a few points I don't understand:

If that leading < Complex > is specified, then the first < ExternArgument> must be where the result is written to (first is again in keeping with the existing pattern in Quil).

Let's first suppose that some of the sketched modifications are adopted, leaving us with pragmas that look like:

PRAGMA extern "3.1415 rando (seedary : mut REAL[])"

I may be mistaken, but I think my first confusion might be due to a typo. Did you mean to write the following:

(s* <Type> s)? s* <Identifier> s* '(' s* <ExternParameters>? s* ')' s*

? That is, I expect that the initial optional syntax is meant to express return type. Let me know if I'm totally lost.

A second concern:

Now to use this in an operation such as CALL prng(random); RZ(random) 0;, you'll need an implicit store / fetch from memory (unless your compiler can do the necessary analysis to keep, use, and drop a temporary memory value). Now if Quil supported something like REAL prng (seed : Integer), a naive compiler can omit the store / fetch if the program contains RZ(prng(seed)) 0.

I'm not sure I agree with this. I think that something like a store and fetch will need to be included either way (either inserted by the compiler or literally included by the programmer). Take this example:

(compiler-hook (parse-quil "DECLARE x REAL; CPHASE(x) 0 1") (build-8q-chip))

Because the parameter to the cphase includes an expression that is unresolvable at compile time, this produces code that includes copies of the expression:

DECLARE x REAL

RZ(-pi/2) 1                             # Entering rewiring: #(0 1 2 3 4 5 6 7)
RX(pi/2) 1
RZ(pi/2) 1
CZ 1 0
RZ(-pi/2) 1
RX(-pi/2) 1
RZ(((-0.5)*x[0])) 1
RX(pi/2) 1
RZ(3*pi/2) 1
RZ(pi) 0
CZ 0 1
RZ(-pi) 0
RZ(-pi/2) 1
RX(pi/2) 1
RZ(((-pi/2)+((0.5)*x[0]))) 1
RZ(((0.5)*x[0])) 0
HALT                                    # Exiting rewiring: #(0 1 2 3 4 5 6 7)

But imagine if x[0] were prng(seed[0]). Then each call to the function would, presumably, produce a new random number, destroying the correctness of the emitted code. The most immediately obvious way to get around this would be to have the compiler do some work to translate that extern-in-an-expression instance into to a CALL that writes to some temporary "scratch pad" memory location and is then referenced in each expanded expression. In the most naive way to support this feature, the code would be scanned for such expressions so that additional memory locations could be declared. A more sophisticated approach might be a able to "manage" that "scratch pad" memory to minimize the declarations required, but doing so in a general way isnt obviously trivial. But besides implementation costs (which sound fun to me), it would result in a program that is less obvious to readers: you might not know from looking at it what the memory requirements of your program are. But maybe I am misinterpreting the intent or am suggesting an inferior solution to the "copying" problem my interpretation raises.

Similar concerns apply when these extern-containing-expressions appear in a DEFGATE.

 DEFGATE WONDERING(%x):
    prng(%x)  prng(%x)
    prng(%x)  prng(%x)

Are these calls meant to evaluate to the same thing? In what order are they meant to evaluate? etc?

Other than these "expression extensions" issues, I think I'm on board with the proposal.

@macrologist

That is, I expect that the initial optional syntax is meant to express return type. Let me know if I'm totally lost.

Yes. It should be an identifier, or more specifically, a <Base Type>, so:

<TypeString> := '"'(s* <Base Type>)? s* <Identifier> s* '(' s* <ExternParameters>? s* ')' s*'"

Other than these "expression extensions" issues, I think I'm on board with the proposal.

Your CPHASE example could be written in a couple of valid ways, one of which is more optimal than the other.

DECLARE x REAL;
DECLARE seed REAL;
EXTERN prng
PRAGMA declare-extern "REAL prng(seed : REAL)"
CPHASE(prng(seed)) 0 1

I see your point here that, because CPHASE is compiled in such a way that the prng(seed) is used in two separate native instructions, this first example, while valid, is non-optimal. My understanding of this program is that the native compilation would call the prng(seed) routine on the hardware for each native gate that uses its result. However, the program it yields is well-defined because prng(seed) is deterministic and the seed value in this case is not mutated.

A better way to write this program would thus be:

DECLARE x REAL;
DECLARE seed REAL;
EXTERN prng
PRAGMA extern "REAL prng(seed : REAL)"
CALL prng x seed
CPHASE(x) 0 1

Here prng(seed) is called only once.

The only case where its optimal to use an extern function in an expression is when you're dealing with an instruction that is native and close to the hardware (ie the expression only needs to be evaluated once, or its result can be managed in non-heap memory).

At the latest Quil meeting, @macrologist suggested the following additional restrictions to the spec:

Just as all other currently supported Quil expressions, expressions with externed functions must be read-only. They cannot contain mutable parameters.
Usage of extern functions within Quil expressions require the associated pragma type string declaration.

Those restrictions both make sense to me.

quil-lang / quil