pre-srfi / static-scheme

11 stars 0 forks source link

Source language compatibility between Steme and Scheme #16

Open mnieper opened 3 years ago

mnieper commented 3 years ago

Given a Steme program without any type annotations, how much do we want to allow it to be different from a Scheme program? For example, do we want to have the same procedures with more or less exactly the same arguments so that the code could also be understood by an ordinary Scheme implementation?

While this sounds very attractive at first glance, it would limit the design space for Steme a lot. For example, in such a Steme, we really couldn't retrofit impure procedures to become internally pure (e.g. the Haskell or Clean approach couldn't work).

For that reason alone, I would like to abandon the idea. Moreover, it is not at all clear whether the idea works within the limits of a realistic and usable type system. For example, cons should be able to produce pairs where both factors have different types. In ordinary Scheme, cons is also used to prepend elements to lists. These lists will have to be most likely homogeneous in Steme, so it is questionable whether the same cons can be used for both purposes in Steme, so we probably have to invent a few new core procedures anyway.

lassik commented 3 years ago

With the big caveat that I'm biased about this issue probably more than about any other issue, approximate source compatibility would be the most attractive feature of the language to me. It would make it so schemers can start using the language with little ramp-up, and would make it easy to translate to Scheme and other languages.

A good principle is "Steme code does the same thing as the equivalent Scheme code, but not all Scheme code can be written in Steme". For example, cons cannot make dotted pairs or lists of mixed types. The type inference algorithm posted to c.l.scheme a while back had separate cons and pair types, cons being restricted to monotype lists and pair being a classical Lisp pair. If Steme has a Datum type, pair could contain two Datums.

lassik commented 3 years ago

The type inference algorithm posted to c.l.scheme a while back

I added the code to this repo since it's easier than accessing usenet nowadays: r4rs-hindley-milner.scm

mnieper commented 3 years ago

Cons could be resolved in this way. But this would only help oneway: even a pure Scheme program cannot usually be run as Steme program because some conses would have to replaced with pairs.

But it doesn't stop there. Quasiquotation would have to decide whether it uses cons or pair, which is not at all clear because sometimes I want to build lists, some times more general Scheme datums using quasiquotation.

It has also to be decided what has to be done with procedures like write, which, as written, make no sense in pure context. Do we allow to reuse the name write for a pure version of write, which will necessarily have arguments (or return values) that differ from those of ordinary write.

lassik commented 3 years ago

You're right, the quasiquotation thing is not going to be fun... perhaps it could be resolved by a separate syntax-quote like #'. This is the point where we depart the elegant simplicity of syntax-rules :)

One option is to have the macro language dynamically typed. It doesn't have to be Scheme; it can be a pure language whose surface syntax looks like Scheme.

mnieper commented 3 years ago

I am confused. Quasiquotation is not tied to macros.

lassik commented 3 years ago

We'd have quasiquotation in both static language and the macro language. Both would use the same backquote syntax familiar from Scheme, but would have different semantics: when backquoting alist, the macro language would always make something of type (list datum) whereas the static language would make (list 'a) where 'a is the type of its elements and could be inferred.

In fact, if the macro language always deals with Datums, maybe it doesn't need to be dynamically typed? Not sure.

lassik commented 3 years ago

And the ordinary quote operator would work on the same principle.

'(+ foo 1) written in the macro language would be equivalent to (list (Symbol "+") (Symbol "foo") (Integer 1)) written in the main language, assuming something like (define-union-type Datum (Symbol string) (Integer integer))

lassik commented 3 years ago

In the main language '(+ foo 1) would cause a syntax error since that language would interpret the elements as two symbols and one integer, not as three datums. If you wanted to make datums there, you'd have to explicitly write the names of the value constructors.

mnieper commented 3 years ago

If a (procedural) macro transformer is written in Steme, it could work on an abstract Datum type. It would make sense to have a version of quasiquote that constructs Datum objects. One could retrofit syntax-case and syntax/quasisyntax so that they work on (wrapped) Datums instead of (wrapped) naked (nested) lists. But this is still a long way off and is probably not needed to find an answer to how quasiquote should work in general.

mnieper commented 3 years ago

In the main language '(+ foo 1) would cause a syntax error since that language would interpret the elements as two symbols and one integer, not as three datums. If you wanted to make datums there, you'd have to explicitly write the names of the value constructors.

Can you explain this in detail?

lassik commented 3 years ago

In the main language '(1 2 3) would be equivalent to (list 1 2 3) and list would be a macro such that that expands to (cons 1 (cons 2 (cons 3 null))). The type of cons is (-> 'a (list 'a) (list 'a)).

In the macro language, typing 123 or "foo" does not have the same meaning as in the main language; in the main language those make an integer or a string, respectively. In the macro language those make an integer datum and a string datum; i.e. every literal value is implicitly wrapped in a datum.

This would imply that operators like + or string-prefix? in the macro language would have to implicitly unwrap the datums to do their job. And give back their return values in a datum-wrapped form.

lassik commented 3 years ago

Or the macro language could be just like Scheme, a dynamic language; and at the boundary when passing the macroexpansion from the macro language to the main language, everything would be turned into datums. Inside the macro language, it wouldn't have to know or care that datums exist.

mnieper commented 3 years ago

Could you remove the references to "macro language". The macro language doesn't have to be necessarily different from the "main language". Actually, as I argued earlier, I think the macro language should be the same as the main language, just running at a different phase.

Back to the topic: '(1 2 3) could also evaluate to (pair 1 (pair 2 (pair 3 '())))...

lassik commented 3 years ago

My whole point rested on the macro language being different from the main language :)

But it depends on what we mean by "language". Maybe the macro language could have the same semantics as the main one, but so that elements of the surface syntax have a different meaning. If the macro language is statically typed, then it has to deal with datums which has broad implications.

mnieper commented 3 years ago

My whole point rested on the macro language being different from the main language :)

But it depends on what we mean by "language". Maybe the macro language could have the same semantics as the main one, but so that elements of the surface syntax have a different meaning. If the macro language is statically typed, then it has to deal with datums which has broad implications.

Please let us discuss this somewhere else. This issue shall be about the original question raised. Otherwise, it is hard for newcomers to catch up and contribute. 😉

johnwcowan commented 3 years ago

Given a Steme program without any type annotations, how much do we want to allow it to be different from a Scheme program? For example, do we want to have the same procedures with more or less exactly the same arguments so that the code could also be understood by an ordinary Scheme implementation?

I would say that should be a guideline but not a rule. Not every Scheme procedure needs to be accessible from Steme.

While this sounds very attractive at first glance, it would limit the design space for Steme a lot. For example, in such a Steme, we really couldn't retrofit impure procedures to become internally pure (e.g. the Haskell or Clean approach couldn't work).

I have yet to be convinced that theser approaches are necessary. Haskell needs the IO monad to become a Turing-complete language. Steme doesn't, because it doesn't have to be Turing-complete by itself. See #20 for some thoughts about an impure annex to Steme.

For example, cons should be able to produce pairs where both factors have different types. In ordinary Scheme, cons is also used to prepend elements to lists. These lists will have to be most likely homogeneous in Steme, so it is questionable whether the same cons can be used for both purposes in Steme, so we probably have to invent a few new core procedures anyway.

I would keep cons for list prepending, which is probably 99% of its uses, and treat arbitrarily typed pairs as a possible representation of 2-tuples, thus not a primitive datatype in Steme at all. Building binary trees out of pairs in Scheme is really just a storage hack: it suffices to use 2-element lists, which allows an immediate generalization to n-ary trees. I'm not sure what to say about handling improper lists, except that they are obviously not Steme lists.

mnieper commented 3 years ago

I have yet to be convinced that theser approaches are necessary. Haskell needs the IO monad to become a Turing-complete language. Steme doesn't, because it doesn't have to be Turing-complete by itself. See #20 for some thoughts about an impure annex to Steme.

Which definition for Turing-completeness do you use so that Haskell is not Turing-complete without the I/O monad?

I would keep cons for list prepending, which is probably 99% of its uses, and treat arbitrarily typed pairs as a possible representation of 2-tuples, thus not a primitive datatype in Steme at all. Building binary trees out of pairs in Scheme is really just a storage hack: it suffices to use 2-element lists, which allows an immediate generalization to n-ary trees. I'm not sure what to say about handling improper lists, except that they are obviously not Steme lists.

I could live without giving pairs the special status they have in Lisp or ordinary Scheme. But we may have to deal with improper lists at least when dealing with source code. (Unless we drop Scheme's uses of improper lists in program texts and macros.)

johnwcowan commented 3 years ago

Offtopic:

With my purist hat on, it annoys me that the notation Scheme-the-language uses for an indefinite number of arguments is tied not only to the notation for improper lists (as it is not in CL, for example) but also to the run-time data structure of (proper) lists. In principle, I think, Scheme should be completely independent of what data structures are or are not available in the language. It would be much better to have an API that deals with optional arguments more like C's va_args: you can process them in order and that's it, perhaps as a generator. Then they can just stay on the call stack with no consing required. On the other hand, something else would be needed for apply too, possibly an accumulator.

mnieper commented 3 years ago

I agree that the way Scheme does it, does not have to compulsory for Steme.

There is some fine model behind Scheme's approach, though. One can actually view it as a language like ML where each procedure and continuation only receives one argument, which is necessarily of list type. All applications are in the first place of the form (apply a b), where b is a proper list, and each procedure is of the form (lambda c ...). In this mental model, a general application (a b1 b2 b3) is just syntactic sugar for (apply (quasiquote (,b1 ,b2 ,b3))) and (lambda (c1 c2 c3) ...) is syntactic sugar for (lambda c (let ((c1 (car c)) (c2 (cadr c)) (c3 (caddr c))) ...)).

The only irregularity here seems to be that the conceptually unary functions necessarily have to take proper lists as arguments. It would be more regular if they could take any Scheme object as an argument. This is the approach of the Kernel language, for example. In such a regularization (of the model that all procedures take only one argument), the last argument of apply can be any object, for example.

I currently see the following options for Steme:

  1. Try to mimic current Scheme as much as possible although John and me have been arguing here that it is not perfect.
  2. Try something like C's va_args. But note that they do not work well in C because variadic functions cannot be forwarded.
  3. Get rid of variadic functions altogether so that each function has a fixed finite number of arguments and returns a fixed finite number of values. Data of variadic length has to be packed explicitly in data structures (not restricted to lists). There is no need for apply. This is what modern C aims to do, thinking of va_args as a relict.
  4. Follow the Kernel model and interpret every contemporary Scheme function as a unary function taking a proper list but allow any other Scheme object instead of just proper lists.
johnwcowan commented 3 years ago

I am confused. Quasiquotation is not tied to macros.

It is, actually: quasiquote lexical syntax expands into macros. That's why, for example, syntax-rules allows vector notation as well as list notation: not because you'd normally want to write a macro whose arguments are wrapped in vectors, but because you want quasivectors like (quasiquote #(1 2 3 (unquote b) 5)) to work.

mnieper commented 3 years ago

I am confused. Quasiquotation is not tied to macros.

It is, actually: quasiquote lexical syntax expands into macros. That's why, for example, syntax-rules allows vector notation as well as list notation: not because you'd normally want to write a macro whose arguments are wrapped in vectors, but because you want quasivectors like (quasiquote #(1 2 3 (unquote b) 5)) to work.

I agree but I fail to see what this truth has to do with the context in which my confusion arised.