jank-lang / jank

A Clojure dialect hosted on LLVM with native C++ interop
https://jank-lang.org
Mozilla Public License 2.0
1.6k stars 44 forks source link

question.. syntax #1

Closed dobkeratops closed 9 years ago

dobkeratops commented 9 years ago

interesting project.

suggestion:

r.e. static typing & templates have you considered trying to retrofit type annotations in a way which fit the syntax of an existing lisp

e.g. start out with untyped code that looks like clojure, generating C++ templates: (defn foo [x y z]...) compiles to template<typename X,typename Y, typename Z> auto foo(X x,Y y, Z z){...} i.e. an argument without any annotation gets an anonymous type parameter.

Then come up with a retrofitted syntax to add annotations for named type-parameters and actual types (perhaps a parenthesised pair in the argument list could specify (argname type)
e.g. (defn foo [(a int)(b int) c]...) would compile to template<typename C> auto foo(int a,int b, C c){...}

clojures' syntax is pretty interesting IMO, the special use of [] and {} does give your eye something specific to latch onto whilst still being easy to parse and not too far from other lisps'.

another possible inspiration for a type annotation syntax is http://adambard.com/blog/core-typed-vs-haskell/

I do realise 'jank' is NOT designed to be compatible with any existing lisp, being a whole new language with unique goals, but maybe if you more closely follow an existing syntax you're more likely to find other people can intuitively read it - 'principle of least surprise'.

I notice ':' having use in your syntax for types, but in lisps that gets used for keywords. Perhaps you could retain lisp-like keywords and a clojure-like syntax where {:x 0 :y 1 :z 2} is like an 'anonymous struct constructor', initializing named fields, and keyword arguments (which seem like a really nice feature in lisps).

jeaye commented 9 years ago

On Thu, Sep 24, 2015 at 12:35:15AM -0700, dobkeratops wrote:

interesting project

Thanks for the support. Let me hit these points as they come.

r.e. static typing & templates have you considered trying to retrofit type annotations in a way which fit the syntax of an existing lisp's syntax

e.g. start out with untyped code that looks like clojure, generating C++ templates: "(defn foo [x y z]...)" compiles to "template auto foo(X x,Y y, Z z){...}" i.e. an argument without an annotation gets an anonymous type parameter.

Then come up with a syntax to add annotations for named type-parameters and actual types (perhaps a parenthesised pair in the argument list could specify (argname type) e.g. "(defn foo [(a int)(b int) c]...)" compiles to "template auto foo(int a,int b, C c){...}".

clojures' syntax is pretty interesting IMO, the special use of [] and {} does give your eye something specific to latch onto whilst still being easy to parse and not too far from other lisps'

another possible inspiration for a syntax is [1]http://adambard.com/blog/core-typed-vs-haskell/

This is something I've considered. However, I'm not a fan of separating the types into a different declaration; as jank progresses, judicious usage of auto/Ɐ for function params and return types will help things to feel more like the dynamically typed languages. Right now, that works with return types (in almost all cases) but isn't yet implemented for function arguments, as I'm still working on generic functions as a whole.

I do realise 'jank' is NOT designed to be compatible with any existing lisp, being a whole new language with unique goals, but maybe if you more closely follow an existing syntax you're more likely to find other people can intuitively read it - 'principle of least surprise'.

That's right. In fact, I take extreme caution not to call jank a lisp or dialect of lisp at all; it's clear, of course that it's greatly inspired by both lisp (namely clojure) and c++ though. I'll hit the 'principle of least surprise bit' next.

I notice ':' having use in your syntax for types, but in lisps that gets used for keywords. Perhaps you could retain lisp-like keywords and a clojure-like syntax where {:x 0 :y 1 :z 2} is like an 'anonymous struct constructor', initializing named fields. Keyword arguments themselves seem like a really nice feature in lisps worth keeping IMO.

You bring up a few points here. Firstly, I've put a lot of thought into allowing () [] and {}, as clojure does, and I've decided against it, for the sake of simplicity. I think, given sufficient (type-safe!) macros, jank should be able to achieve good expressiveness and I'd like to keep the syntax simple.

Following that 'simple syntax' rule, jank doesn't use much beyond () and : until you get to variadics, when & shows up. It doesn't support reader macros, has a common namespace for functions and variables, and so once you know the () and : you're pretty much in the clear. This is not a 'principle of least surprise' relative to other s-expr based languages, but it is, I hope, internally very consistent and easy to understand.

Lastly, keywords. They work well in dynamic languages, no doubt. In jank, a stong, statically typed language, what would be the type of a keyword? Do they all have the same type? The best use for keywords I've come up with so far is like this:

  1. Each keyword has its own unique type
  2. Keywords can be used, similar to CL and Clojure, as a way to provide policies to functions

That said, using keywords in maps and arbitrarily shooting around runtime symbols in jank is not going to happen; it defeats the purpose of the strong, static typing and all the work done at compile-time. Where keywords may be used, however, is in a variadic type list, representing unique data. Here's a simple example, ported from one in jank's README:

(; Generic on the component type and a variadic number of policies. ;)
(struct coord : (:T-component &T-policies)
  (x T-component)
  (y T-component))

(; :cartesian represents a unique type, conveying that
 ; the coordinate should be in a certain system. ;)
(bind point (new : (coord : (real :cartesian))))

(; A function could then check for this policy when graphing. ;)
(ƒ graph : (:T-component &T-policies) (point : (T-component T-policies)) ()
                where (^contains? T-policies :cartesian)
 (do graphing with point))

Note that ^contains? is a type-safe macro predicate (like a type trait in C++). Now, with all of that said, this may not make it into jank. If keywords go in, to reiterate, it will likely be in this form.

As a final point, where did you hear of jank? I haven't been spreading it around much yet, since it's still very young.

jeaye commented 9 years ago

To follow up about the keywords: If I do choose to implement them as shown above, there's a good chance it will end up affecting the dependent type syntax (:T-component expresses that it's dependent and is not a full specialization -- C++ terms, sorry). The policy approach can certainly still work without keywords, but it'd require making the types ahead of time and then referencing them normally. The keyword would simply be a shortcut to implicitly define a type, if needed, which is unique for that keyword.

dobkeratops commented 9 years ago

"As a final point, where did you hear of jank?"

on IRC

I've had a bash at a language myself (rust inspired), perhaps with some overlapping goals (I haven't touched it in months but might go back to it) https://github.com/dobkeratops/compiler

I originally started out with s-expr, but moved to rust-inspired syntax.

I'd always thought it would be interesting to define a language with an s-expression syntax , closer to a raw AST, then make a parser that just compiles a friendlier syntax to that. That would reduce the indecision and conflict over syntax during development,as you could easily plug different syntaxes in.

(of the people that commented on mine, first I had requests for 'significant whitespace', and many people mention that they don't like any divergence from C++ .. e.g rust's 'fn' gets a surprising amount of hate)

One thought that went through my mind when seeing Jank was, "are the goals close enough that my language could compile itself to yours". it would be interesting to compare what we've implemented, and what you intend to implement.

I think we both want something with 'the strengths of C++, but more functionally inspired, for parallelism..'

I might have written a more elaborate compiler but of course that means there's more to go wrong, more to maintain. Your simpler approach going from S-Expr to C++14 is quite interesting.

dobkeratops commented 9 years ago

That said, using keywords in maps and arbitrarily shooting around runtime symbols in jank is not going to happen; it defeats the purpose of the strong, static typing and all the work done at compile-time.

Sure. I agree with that.

Whilst its' a map with keywords in clojure, I'm thinking of compiling its' map syntax as anonymous structs, as an optimization for a limited set of uses of maps; all compile-time propagated, the keywords never generate any storage. e.g {:x 1 :y 2 :z 3} would become x_y_z(1,2,3) with template<typename X,typename Y,typename Z> struct x_y_z{ X x;Y y; Z z; x_y_z(X _x,Y _y,Z _z):x(_x),y(_y),z(_z){}; } (pick some suitable name-mangling scheme)

I figured an anonymous struct should be possible as a static type - much like tuples in Rust or haskell, just with labelled fields. If its' immutable and propogating types forward, it should be able to work that way.. you'd never need to insert new data.

I've tried to support something like this in my own language attempt, e.g. as return values. I'd wondered if [] could simply make a vector literal, although I realise you'd often want to express std::array<T,N> rather than std::vector.

Perhaps if you were calling functions with actual struct types, one could think about automatically making coercions between anonymous structs and 'real' structs (filling in default values), and optimise out the conversion. That might start to need fancier inference though.

(asside - motivation: When I dabbled with clojure, what impressed me was how easy it was to test things out in the repl: I think the literal syntax contributes to that. it would also make writing tests really easy. I saw a tool where you can write a literal input & output, and it searches all the functions, brute force, finding any that do what you asked :) really handy for discovery.

Given how important tests, discovery, and debugging all are - I figured it may well be worth having [] for anonymous vectors even if they're of no use for actual code.)

jeaye commented 9 years ago

On Thu, Sep 24, 2015 at 02:49:54AM -0700, dobkeratops wrote:

   "As a final point, where did you hear of jank?" IRC

I've had a bash at a language myself, perhaps with some overlapping goals (I haven't touched it in months but might go back to it) [1]https://github.com/dobkeratops/compiler

Neat, let's take a look. I can definitely see the rustiness showing here. I think, for my tastes at this point, this is too imperative! :P Your map function, for example, is a reserve and then iteration and mutation to build the return value. A map function in jank might look like:

(ƒ map (f Ɐ src Ɐ) (Ɐ)
  (if (not (empty? src))
    (cons (f (first src)) (map f (rest src)))
    src))

(; For added clarify, the signature might use some constraints. ;)
(ƒ map (f Ɐ src Ɐ) (Ɐ) where (and (^callable? f) (^container? src))
  ...)

The only other thing that sticks out would be the OOP support. I've dropped it completely, in jank, even moving RAII handling to non-member functions. When it comes to systems-level work, I very, very rarely use virtual functions or inheritance. Modern C++ gives us value semantics and very powerful compiler so we don't need to.

With that said, I certainly see what you're going for. Especially the way template template params can be cleaned up!

I originally started out with s-expr, but moved to rust-inspired syntax.

I'd always thought it would be interesting to define a language in 2 parts, an s-expression syntax- closer to a raw AST, then a front-end -then you could choose different 'skins', like curly braces or significant whitespace. That would reduce the indecision and conflict over syntax during development. (of the people that commented, first I had requests for 'significant whitespace', and many people mention that they don't like any divergence from C++ .. e.g rusts 'fn' gets a surprising amount of hate)

Ah! Writing languages to please other people is a nightmare. The approach I'm taking to jank is that it's everything I want it to be, since I'm making it so that I don't have to put up with all this imperative/syntax shit in C++ and all the dynamic typing/garbage collection/JVM deps in CL and Clojure. :P

What I'm currently working on, which is about 95% done (will finish today), is a serialization of jank's AST to a (very verbose) s-expr form. I plan to separate the compiler and interpreter into individual binaries and allow streaming from one to the other, or into a file for later usage, or into a transcompiler for C++14.

One thought that went through my mind when seeing Jank was, "are the goals close enough that my language could compile itself to yours". it would be interesting to compare what we've implemented, and what you intend to implement.

That would be neat! Jank is, as I mentioned, still very young, but I'd be excited to work with you if you were to implement that.

I think we both want something with 'the strengths of C++, but more functionally inspired, for parallelism..'

Well said! I have a background in game development and the state of parallelism in game engines is fucking pitiful. Jank's big sacrifice is data locality for cache optimizations, since its data structures will be more like Clojure's (not contiguous), but, in 2015, and for the type of systems-level work I want to do, that's perfectly ok with me.

Example of the AST printing (note that every jank program is in an anonymous function body): https://gist.github.com/jeaye/1b24173fc5e0a0b0a965

jeaye commented 9 years ago

On Thu, Sep 24, 2015 at 02:56:17AM -0700, dobkeratops wrote:

   That said, using keywords in maps and arbitrarily shooting around
   runtime symbols in jank is not going to happen; it defeats the purpose
   of the strong, static typing and all the work done at compile-time.

When I dabbled with clojure, one feature that impressed me was how easy it was to test things out in the repl: i think the literal syntax contributes to that. I saw a tool where you can write a literal input & output, and it searches all the functions, brute force, finding any that do what you asked :)

Yeah, the REPL is the backbone of most lisps. Figuring out how a statically typed language can work as comfortably with it has been on my burner for a while. I'm currently, on the side, working on a simple REPL in jank which requires a custom build of quite a bit of the source, disabling certain error handling like multiple definitions. With that, I've been working on a web-based REPL using Clojure and Clojurescript to interface with the actual REPL. Hopefully people will be able to try jank securely from my website soon!

If you're into C++ and REPLs, have you looked into cling? I haven't bothered with it, but it seems like an interesting project for sure. Before you consider working on a REPL for your Rusty C++, you might consider riding on cling!

jeaye commented 9 years ago

Your compiler seems like it's much further along than mine, since you actually have codegen. I'll take a look at the source in a bit. Also, I just saw, you have a REPL already. Neat. :)

dobkeratops commented 9 years ago

I have a background in game development

similar :)

The only other thing that sticks out would be the OOP support. I've dropped it completely, in jank, even moving RAII handling to non-member functions. When it comes to systems-level work, I very, very rarely use virtual functions or inheritance. Modern C++ gives us value semantics and very powerful compiler so we don't need to.

Sure. Since my compiler was written in C++, I found using OOP for AST nodes a natural aproach. I didn't consider it 'performance critical'. I've seen how functional languages (e.g. rust's implementation) use ADTs. I figured my path for 'self-hosting' would be to transpile my C++ source eventually, so I've tried to implement a very limited form of OOP - limited internal vtables. Not really inheritance, just defining an interface then implementations of that.

However I haven't even started a transpiler.

I think, for my tastes at this point, this is too imperative! map implementation

I was basically happy with most of C++ or Rust, I figure either only really need a few tweaks to satisfy me. Pure functions can have impure subsets, and vica versa. Once you've got immutable default and you know you aren't hitting globals, you do open up a lot of parallelism. you mostly just need to know 'writes don't overlap each other, or overlap reads'.

my approach would be to make a parallelized implementation of map e.g. par_map. you might want a different coarse or fine grain version (and to pass additional hints ,e.g batch size, with a best guess based on the type..)

I wonder if the nesting level could be made visible to the overloading scheme, e.g. so outer iterations become threads, whilst inner iterations become SIMD... whatever. (all depends on the machine), then you could indeed just write 'map' and know it would fall through to par_threaded_map(), par_simd_map().. where appropriate)

I realise if you have a sufficiently smart compiler you can do all those substitutions. Perhaps you can have a way of feeding it substitutions for optimisation.

I also became interested in the 'epiphany/parallella' (having had experience on PS3 CELL) where you'd want to express DMA (data-movement) as part of the implementations of 'map' etc. Parallella really needs a 'data-flow' language, but I think passing lambdas into high-order-functions can also express data flow quite well. (e.g. MapReduce.. )

Its' kind of academic since no decent device like that exists today but there's a chance they'll appear through IoT and exascale(?), and we have seen similar in the past - there are a few companies trying to make something similar now. (rex computing, STM P2012, epiphany)

Their presentations are fascinating, they talk about a near future chip with a grid of processors directly under columns of stacked memory, i.e. each core would have fast access to a fairly significant amount of memory .. almost literally like a tiny scale model of a supercomputer :)

Another point of reference is C++amp which might start making more sense on HSA systems & other unified memory approaches. being able to throw inner iterations 'at the GPU' might make more sense if you aren't throwing data over a PCI bus. As consoles have unified memory this has widespread application already, I think most phone SOCs are too.

dobkeratops commented 9 years ago

"Also, I just saw, you have a REPL already. Neat. :)"

I dont, sorry thats' a red herring :) I'd intended to do one - I wanted a language which was REPL friendly - but haven't put any time into it

If you're into C++ and REPLs, have you looked into cling? I haven't bothered with it, but it seems like an interesting project for sure. Before you consider working on a REPL for your Rusty C++, you might consider riding on cling!

.. i haven't looked into cling , so thanks for the suggestion.

dobkeratops commented 9 years ago

Your compiler seems like it's much further along than mine, since you actually have codegen.

slightly hacky: I emit LLVM source code directly, as ASCII. mostly because I'm a bit lazy at integrating libraries, but it was also intuitive to look at C++ output from clang and compare.

I have tried to abstract the back end 'writer' to do either a C(or C++?) backend later, or use the actual LLVM library. My long term goal was to be able to transpile a language subset both ways.

jeaye commented 9 years ago

my approach would be to make a parallelized implementation of map e.g. par_map. you might want a different coarse or fine grain version (and to pass additional hints ,e.g batch size, with a best guess based on the type..)

You're absolutely right about that. Even const-by-default in C++ (god, I wish there were a warning I could turn on for that -Wconst-by-default) would be a huge step in the right direction, and a huge step toward better thread safety. Still, pure functions, it seems to me, should exist in every place possible. (more below, which Github is collapsing, for some reason ...)

Your implementation, admittedly, doesn't suffer from much of the same data locality/cache optimization issues as jank. Rust's compiler is very complex, especially with its borrow checking, but it offers so much safety. By simplifying the language, and relying on pure functions, I'm hoping to have a simpler compiler (I mean, it's a compiler -- it's still fucking complex) while keeping the same safety guarantees.

But here's the big thing. You have the ability to call into C libraries, as I've seen with your extern functions. Fuck, I don't, and I've bashed my head against a lot of walls trying to figure out how I want to do it. I want jank to be safe; C is about as far from safe as one can possibly get, so exposing any sort of C marshalling directly to jank is not going to happen. Right now, I have a "native plugin" API which allows me to define "native functions" in C++ which will be exposed when the compiler compiles and interprets. The current plan is to allow shared objects to be specified to the compiler which contains these native descriptions. So I might write a native OpenGL wrapper and ship it as a library, aside form jank, and someone might use it with (example):

$ jank my_source.jank --native opengl

This has been working well enough so far, but that's because I'm only interpreting. As soon as we get to C++ codegen (or ASM, or LLVM IR, whatever), I'm not sure how these native plugins will be able to describe the code that needs to be generated. As it's currently planned, they won't; the generated code will just call into the shared object and all plugin shared objects must be shipped with the resulting binary. I realize this is not ideal, but I'm not decided on how it can be better (while keeping jank safe!).

Rust does a lot with its unsafe blocks and it means that one can do a lot of pointer magic and C interfacing right in Rust. It's convenient, hell yeah, but there might be a better, safer way.

dobkeratops commented 9 years ago

I want jank to be safe; C is about as far from safe as one can possibly get, so exposing any sort of C marshalling directly to jank is not going to happen

my goal was simpler, just easier to write parallel code than C++, but faster iteration than Rust (not necessarily safe)

Rust does a lot with its unsafe blocks and it means that one can do a lot of pointer magic and C interfacing right in Rust. It's convenient, hell yeah, but there might be a better, safer way.

Perhaps one idea is sustitutions , 'shape-analysis': maybe you could supply pairs of implementations that you hint as semantically equivalent, and the compiler is at liberty to swap them. e.g. your 'pure functional map' could be substituted for a piece of framework code that handles thread pools, and beyond that you could supply efficient versions of map+reduce, map+filter, etc .. separating the description of substitutions from the program.

When it comes to calling C however, you rely on the C libraries themselves ... no idea how to improve that, compared to the traditional empirical techniques

Still, pure functions, it seems to me, should exist in every place possible.

agreed, they open the door for more automatic parallelism.

I think Rust does quite well with its' iterator libraries, (like lazy-expression templates in C++?) but I'm interested more in 'internal iteration' for parallelism, I think you can still implement lazy-eval and chaining there but you tend to have to write some final call to actually execute something. (e.g. result=foo.map(...).filter(..).collect(); ). I can see a pure language could be a bit smarter and more intuitive to use.

I would be interesting to see what the haskell community has produced, I have heard of attempts to generate parallel code there but never looked into it.

jeaye commented 9 years ago

Closing! Feel free to join #jank on freenode.