pre-srfi / static-scheme

11 stars 0 forks source link

First random comments on John's first draft #1

Open mnieper opened 3 years ago

mnieper commented 3 years ago

Ideas for a statically typed dialect of Scheme

Name: I'll use Steme here, but it might be Statics or something else.

A working title is good; the final name doesn't matter much at the moment.

What: A statically typed pure eager language.

Sounds good to me if it has some support for co-inductive data (e.g. streams) as well.

Code: Steme looks and feels like Scheme, but behaves more like ML: a statically typed eager language

At some point, we should reflect on what Steme actually does better than ML.

Immutable: All Steme types are immutable, though at the Scheme level they may be mutable.

The immutability at the Steme level is enforced due to purity, isn't it? If something is actually mutated, the type system has to make sure that the program still behaves as if nothing has been mutated.

Persistent: Some but perhaps not all types have persistent variants.

Procedures: Accept multiple arguments and return multiple values, unlike Haskell or ML. Therefore no currying needed (see SRFI 36.

You mean SRFI 26, I suppose.

Before writing down a specification of the type system, I find it hard to give a good answer to this question. In ML, I can pass tuples around and store them in data structures, making them first-class values. In Scheme, I cannot do this directly with multiple values.

Macros: Same in Steme as in Scheme (unclear exactly what macros R7RS-large will be required to support). Unclear whether low-level macros can be written in Steme.

The type system of Steme is hopefully rich enough to make macro transformers typeable. It won't be trivial task, though.

Tuples: Multiple values.

See my comment above.

Static typing: Code must typecheck using Hindley-Milner.

HM is a classical system with a lot of merits. But we should investigate more recent developments and extensions of ML (you mentioned some below) before we finally decide on a type system. There seems to be an agreement that HM alone is not expressive enough.

Parametric polymorphism: As in ML. Examples: list, vector, dictionary.

Ad hoc polymorphism: Type classes as in Haskell. (ML has exactly one type class corresponding to Haskell Eq.) Support multi-parameter classes, as GHC does but Haskell 98 does not.

Don't multi-parameter classes have some ambiguities?

Again, at some point, we need to write down the type system formally and prove that it is sound. At the moment, everything we want cannot be more than a wishlist.

A very Lisp/Scheme-like approach actually would be if the type system itself could be implemented in a library (run at an earlier phase) so that the type system is flexible as much as Scheme code is flexible due to macros.

Any type system is always a compromise (between safety and expressibility), so a programmable type system would be great.

Simple numeric types: Exact integer, exact rational, inexact real, inexact complex.

Looks good.

Characters: Not a type, just 1-character strings, but can be represented separately at the Scheme level.

Looks good as well.

Strings: Iterable but not indexable. Iteration can be by codepoint, default and extended grapheme cluster, line, or language-specific levels like word and sentence.

Looks good. We may think about whether strings are ropes behind the scenes so that persistent update operations can be easily specified.

Simple types: Defined by Scheme predicates.

Intersection types: Scheme record types.

Union types: Non-reified types whose predicate is an OR of the member types.

Part of the wish-list from above.

I would like to add uniqueness types so that linear update procedures can be used safely.

We also have to decide in which way we want to model I/O in a pure language.

FFI: Scheme is the FFI of Steme. In order to call a Scheme procedure, the programmer must certify that it is pure and what type it returns. Such certifications can be global or local.

Compilation targets: R7RS Scheme, a particular Scheme with its type annotations (hoping that the Scheme compiler will remove type checks) C code with stubs for a particular Scheme's FFI (minimal type checks to separate fixnums and bignums, etc.). Not a goal to compile to other random languages.

Good.

Calling Steme: If Scheme calls Steme with arguments of the wrong type, it is an error, but Steme may or may not catch the error, depending on the implementation.

Exceptions: Steme can raise an exception. Steme can also catch an exception and replace it with an eagerly computed value, but it does not know what the exception is.

Exceptions can be explained in terms of call/cc. We should therefore talk about call/cc first. I don't see a reason why Steme cannot inspect any caught exception.

(Of course, we also have to specify dynamic environments in Steme to define exception handlers.)

Logging: Steme can log anything, but it is an error to attempt to reread the log.

Because of purity?

johnwcowan commented 3 years ago

I've silently removed everything I agree with and adjusted the pre-spec accordingly.

Sounds good to me if it has some support for co-inductive data (e.g. streams) as well.

That leads to the question: should Steme be a total functional language? That is, partial functions are not allowed and all recursion is structural or of (n-k) form)? TFP with data and codata is very appealing formally, but may be too annoying to use.

At some point, we should reflect on what Steme actually does better than ML.

I think one answer is that it is twinned with Scheme without merging with it. Steme doesn't have to do everything.

The immutability at the Steme level is enforced due to purity, isn't it?

Yes.

Before writing down a specification of the type system, I find it hard to give a good answer to this question. In ML, I can pass tuples around and store them in data structures, making them first-class values. In Scheme, I cannot do this directly with multiple values.

SRFI 195 is your friend here.

The type system of Steme is hopefully rich enough to make macro transformers typeable. It won't be trivial task, though.

I'd be happy to stay away from it.

Don't multi-parameter classes have some ambiguities?

I don't know. They seem to be well-accepted in Haskell, and are necessary to emulate the ML model system in Haskell. (I can't find the paper about modules with typeclasses and typeclasses with modules that I read.)

Any type system is always a compromise (between safety and expressibility), so a programmable type system would be great.

It would, but I don't know if it's practical.

Looks good. We may think about whether strings are ropes behind the scenes so that persistent update operations can be easily specified.

As I said, I want to distinguish between persistent and non-persistent immutable data types rather than confounding them.

I would like to add uniqueness types so that linear update procedures can be used safely.

That will take some thought.

We also have to decide in which way we want to model I/O in a pure language.

I'd prefer to leave it in Scheme, at least for now. This ties in with streams, which tie in with data/codata.

Exceptions can be explained in terms of call/cc.

There are two issues here. Scheme has resumable exceptions in the form of raise-continuably. However, the proof obligation imposed on a Steme programmer when some random Scheme exception handler is at the top of the stack that it will return a value of a particular type will, I think, be too difficult. In general, calling unknown first-class procedures is tricky because you don't know what you are targeting, and especially for a construct meant to operate across abstraction boundaries. So let's say we confine ourselves to raise only.

I don't see a reason why Steme cannot inspect any caught exception.

Because it is an effect. See pp. 31-34 of "Tackling the Awkward Squad" for better explanations than I can give. In brief, though: what is the type of (throw x1) + throw(x2)? It depends on the order of evaluation, which is something neither Haskell nor Scheme (nor indeed C) guarantees, though CL and Java do. By returning an imprecise exception, something that might have type x1 or x2, you can throw exceptions anywhere but can catch them only in the IO monad, where indeterminacy is expected.

Logging: Steme can log anything, but it is an error to attempt to reread the log.

Because of purity?

Just so. Indeed, if it were guaranteed that there was no closed loop from writing to reading outside the program, these activities would be effect-free, with readable files containing data and writable files containing (a finite unfolding of) codata.

mnieper commented 3 years ago

I will comment on the two issues that I think are most important first:

We also have to decide in which way we want to model I/O in a pure language.

I'd prefer to leave it in Scheme, at least for now. This ties in with streams, which tie in with data/codata.

Handling I/O in Scheme means that the Scheme procedures we want to call will be necessarily impure (because I/O is called for side effects). We therefore have to isolate this bit of impurity as otherwise impurity will creep into the whole language. So, we don't win much by pushing I/O on the other side of the FFI border.

In order to fence in the impurity from I/O, any I/O procedure can be modeled as a function T x World -> S x World, which conceptually states the world interface. The type system can the hide the World state in a monad or use some other tricks to make it more pleasant to use, but the underlying mechanism wouldn't change. Any foreign Scheme procedure that does I/O will have to be typed using the World type, at least implicitly.

Of course, just having a World type is not enough. The type system has to forbid that World states can be shared, but is only used in an at most linear fashion. In Haskell, this is done by hiding it in the I/O monad. In Clean, this is done through uniqueness types.

So the upshot is this: Whether I/O is done by Scheme or Steme procedures/functions does not answer the question how I/O is modeled in a pure language or how we want to model it.

There are two issues here. Scheme has resumable exceptions in the form of raise-continuably. However, the proof obligation imposed on a Steme programmer when some random Scheme exception handler is at the top of the stack that it will return a value of a particular type will, I think, be too difficult. In general, calling unknown first-class procedures is tricky because you don't know what you are targeting, and especially for a construct meant to operate across abstraction boundaries. So let's say we confine ourselves to raise only.

Because it is an effect. See pp. 31-34 of "Tackling the Awkward Squad" for better explanations than I can give. In brief, though: what is the type of (throw x1) + throw(x2)? It depends on the order of evaluation, which is something neither Haskell nor Scheme (nor indeed C) guarantees, though CL and Java do. By returning an imprecise exception, something that might have type x1 or x2, you can throw exceptions anywhere but can catch them only in the IO monad, where indeterminacy is expected.

Forbidding raise-continuable would be throwing the baby out with the bathwater, I think. For a returning raise-continuation, the situation is not different to calling some function that is bound in the dynamic environment. If the handling of the exception reaches across the interface between Steme and Scheme, it is as if we bind a Scheme procedure in the dynamic environment, which has to be associated some type as it is with all FFI procedures.

As for raise, there isn't a problem if it is modeled correctly: In a lazy language like Haskell, when started, the program receives (conceptually) a world state value. The typing system guarantees that this value will be used at most once. In a strict language, which is dual, the initial world state value becomes under duality a final continuation that accepts a world state. The only way to halt the program is to finally call this final continuation. Here, the typing system has to guarantee that the world state when delivered to the final continuation hasn't been used more than once.

Now raise may terminate a program, so it needs a world state as otherwise it cannot, eventually, pass to the final continuation. Thus, code like (+ (raise x1) (raise x2)) won't be accepted by the type checker as the world state would have to be duplicated. In other words, raise is like an I/O procedure in that it takes a world state. It doesn't deliver a value, though, because it is a continuation. Those parts of Steme that potentially raise exceptions therefore have to be declared at such and will be handled in a definite sequential order.

The procedure raise-continuable is different. It doesn't need a world state because it returns.

lassik commented 3 years ago

That leads to the question: should Steme be a total functional language? That is, partial functions are not allowed and all recursion is structural or of (n-k) form)? TFP with data and codata is very appealing formally, but may be too annoying to use.

This is unlikely to work out conveniently, though I'm open to arguments that it will. What's the type inference situation?

At some point, we should reflect on what Steme actually does better than ML.

I think one answer is that it is twinned with Scheme without merging with it. Steme doesn't have to do everything.

My original motivation with Statics was exactly that ML is good enough as it is; even a subset of SML/OCaml is good enough. In that case the whole language design project is about Scheme-like surface syntax; adding Scheme macros; and discovering a good way to ship data between dynamic Scheme and a static language. The type system research aspect started with Marc :) However, it's very educational and may yet lead to something good.

Before writing down a specification of the type system, I find it hard to give a good answer to this question. In ML, I can pass tuples around and store them in data structures, making them first-class values. In Scheme, I cannot do this directly with multiple values.

SRFI 195 is your friend here.

Multiple-value boxes. Whoa, you're right. Another happy coincidence.

The type system of Steme is hopefully rich enough to make macro transformers typeable. It won't be trivial task, though.

I'd be happy to stay away from it.

One benefit of our choice to go with full static typing, i.e. all type errors being caught at compile time, is that the compiler will catch any type errors in a macro expansion. This makes dynamically typed macros much less problematic; I'd agree with John that they're good enough.

the paper about modules with typeclasses and typeclasses with modules

This one? https://www.reddit.com/r/ProgrammingLanguages/comments/j7ypnq/unifying_typeclasses_and_modules/

Any type system is always a compromise (between safety and expressibility), so a programmable type system would be great.

It would, but I don't know if it's practical.

If I'm not mistaken, Qi (lisp) has something like that. Are dependent types also going in that direction?

We also have to decide in which way we want to model I/O in a pure language.

I'd prefer to leave it in Scheme, at least for now. This ties in with streams, which tie in with data/codata.

Like Marc, I'd also like to have I/O in the static language as well. I'd prefer that the static language is also usable as a standalone language apart from Scheme if someone wants to do that.

mnieper commented 3 years ago

That leads to the question: should Steme be a total functional language? That is, partial functions are not allowed and all recursion is structural or of (n-k) form)? TFP with data and codata is very appealing formally, but may be too annoying to use.

This is unlikely to work out conveniently, though I'm open to arguments that it will. What's the type inference situation?

In a total functional language, one cannot write an interpreter for itself. This limits the applicability considerably.

One benefit of our choice to go with full static typing, i.e. all type errors being caught at compile time, is that the compiler will catch any type errors in a macro expansion. This makes dynamically typed macros much less problematic; I'd agree with John that they're good enough.

I tend to disagree. Even Haskeller's admit that the procedural macro system of Scheme (R6RS at the moment because there is no such thing for R7RS yet) is much more appealing than things like Template Haskell. In Scheme, it's only the phase level that tells macro or runtime code apart, and exactly the same language is used at each meta level. If possibly, I would like to carry over this beauty to Steme. Saying that procedural macros in Steme should be written in dynamic Scheme would be equivalent to saying that procedural macros in Scheme should be written in C (by the FFI analogy).

As macro expanders handle syntax objects and syntax objects are just Scheme datums with some hygiene annotations sprinkled in, I don't see a major problem to handle them in a type-safe manner. (For comparison, the GHC compiler also handles Haskell syntax trees in a type-safe manner.)

Having a usable type-safe eval is much less clear, although we shouldn't exclude it prematurely. We could have a typed version of eval: Wherever it appears, it has to have an expected type and the expression to be evaluated is rejected if it is not of the expected type.

Like Marc, I'd also like to have I/O in the static language as well. I'd prefer that the static language is also usable as a standalone language apart from Scheme if someone wants to do that.

I agree. A strong tie to dynamic Scheme is one raison d'être for Steme, but as much as Scheme can work without a C layer beneath, so should Steme without a Scheme layer.

lassik commented 3 years ago

Saying that procedural macros in Steme should be written in dynamic Scheme would be equivalent to saying that procedural macros in Scheme should be written in C (by the FFI analogy).

Scheme is a higher-level language than Steme, whereas C is a lower-level language than Scheme. Macros are usually written in a higher- or equal-level language.

As macro expanders handle syntax objects and syntax objects are just Scheme datums with some hygiene annotations sprinkled in, I don't see a major problem to handle them in a type-safe manner. (For comparison, the GHC compiler also handles Haskell syntax trees in a type-safe manner.)

I agree that typed macros are really neat, and would be no problem if they work, but I fear the complexity.

Having a usable type-safe eval is much less clear, although we shouldn't exclude it prematurely. We could have a typed version of eval: Wherever it appears, it has to have an expected type and the expression to be evaluated is rejected if it is not of the expected type.

(as <result-type> (eval '(+ 1 2 3))) and raise an exception if the types don't match?

(+ 1 (eval '(+ 2 3))) we could omit the as type annotation because the outer + 1 coerces the type inference so that the other argument must be an integer as well.

mnieper commented 3 years ago

Scheme is a higher-level language than Steme, whereas C is a lower-level language than Scheme. Macros are usually written in a higher- or equal-level language.

In what sense do you mean the Scheme is higher-level than Steme?

(as <result-type> (eval '(+ 1 2 3))) and raise an exception if the types don't match?

If it raised an exception, eval would effectively become an I/O procedure as it would need exclusive access to the world state (by my reasoning from above).

Much better, in my opinion, is an eval with the following polymorphic type (I write types with capital letters and type variables without):

eval<a>: \forall b: Datum * (a -> b) * (Error -> b) -> b

It is called as follows (eval<a> datum success failure). If datum yields the expected type a, the function success is invoked. Otherwise, failure is invoked with some error. Both success and failure have to converge to some type. (eval<a> is just a placeholder for some unspecified syntax to describe the expected type.)

One example to use eval<a> is:

(eval<a> datum right left)

where right and left are two constructors of an Either type encapsulating a value of type a on the right side and of type Error on the left side. In other words, b becomes this Either type.

If one actually needs exceptions, it would be (modulo syntactic sugar):

(eval<a> datum (lambda (x) x) (lambda (e) (raise the-world e)))

Note that this expression needs access to the world state.

lassik commented 3 years ago

OK, that makes sense!

lassik commented 3 years ago

Scheme is higher-level than Steme in the sense that it is more flexible about types (and about I/O, if we choose to restrict I/O in Steme). If Steme has some compulsory type annotations (which I hope we don't, but let's see) those will not be needed in Scheme. Otherwise it's about on an equal level.

mnieper commented 3 years ago

Scheme is higher-level than Steme in the sense that it is more flexible about types (and about I/O, if we choose to restrict I/O in Steme). If Steme has some compulsory type annotations (which I hope we don't, but let's see) those will not be needed in Scheme. Otherwise it's about on an equal level.

Thanks for the explanation. I came from John's FFI analogy, so my "higher-level" was a different one than yours.

As for compulsory type annotations: I wished we didn't need them but it may be very well the case that the resulting type system would be too limited.

lassik commented 3 years ago

With compulsory type annotations, there's the risk that it will soon look something like Rust, which in principle has type inference for a lot of stuff but in practice the code is full of annotations. It has a very different feel from Scheme (which we hope to follow in spirit), and it's difficult to make a case that code should be so heavily annotated unless it solves some truly difficult problem. It may turn out that we have no choice but to limit the type system if we want to keep annotations minimal.

The complexity (for humans, as well as big-O complexity) of the type system and the inference/type-checking algorithm is also worth paying attention to. Rust and Haskell have big and slow compilers. Even something like SML module system is already quite complex. For many Schemers, simplicity is a big draw of Scheme.

mnieper commented 3 years ago

The is something between no type annotations and "code full of type annotations": What I meant is that it may be that we need some type annotations in well-specified corners, which may include record type or type classes or whatever.

Limiting the type system could mean that we lose too much expressiveness. C is in such a situation, where one absolutely needs void *.

lassik commented 3 years ago

Right - something in the ballpark of type classes could still be reasonable.

What expressiveness do we lose? MLs solve I/O and exceptions by not incorporating those aspects of the language into expressions' type signatures. The language is expressive, it just checks fewer things.

mnieper commented 3 years ago

ML is not pure HM but extends HM with modules, for example.

lassik commented 3 years ago

Ah, that's right. And subtyping (which H-M has trouble with) is done higher up in the module layers.