ozra / onyx-lang

The Onyx Programming Language
Other
96 stars 5 forks source link

Macros and Templates #53

Open ozra opened 8 years ago

ozra commented 8 years ago

This is the only (?!) missing feature in Onyx atm, and the culprit is syntax (and a whoooole lot of coding).

Syntax

While indent-significant layout is ultimately apt for coding, it's not so for meta-coding-contexts. Meta-code cause indentations that has to be compensated for in verbatim-paste-code, it quickly becomes un-manageble.

The only conclusion I can arrive at is that in template macros explicit end-tokens will be required, and indentation is not significant. It's not really a normal coding context.

Concept

template blabla(x, y) =
   ...

template def blabla(x, y) ->
   ...

macro blabla(x, y) ->
   ...
stugol commented 8 years ago

Meta-code cause indentations that has to be compensated for in verbatim-paste-code

Why?

The only conclusion I can arrive at is that in template macros explicit end-tokens will be required, and indentation is not significant.

It's a minor limitation. I have no problem with it.

Template Def Macro - -""- for creating defs.

What?

AST Macros are run macros where you work on the AST and return the result instead of working with a template

I'm afraid you'll have to explain things a bit better. My understanding of the AST is pretty much nonexistent. I don't have a background in compiler design.

ozra commented 8 years ago

Looking back I realize I wouldn't understand that myself, had I not written it.

1 (& 2). Let's take a practical example in crystal code, from the compiler. These are cut out parts of the whole util only to highlight the problems faced, they are part of the functionality for debugging the AST-tree (courtesy of, iirc, @bcardiff):

macro dump_prop(name)
  io << "\n" << "  " * (level+1) << {{name.stringify}} << ": "
  if v = {{name}}
    if v.is_a?(Array)
      if v.empty?
        io << "[]"
      end
      v.each_with_index do |e, i|
        io << "\n" << "  " * (level+2) << "[" << i << "]"
        e.dump_inspect(io, level + 2)
      end
    else
      v.dump_inspect(io, level + 2)
    end
  else
    io << "nil"
  end
end

module Crystal
  abstract class ASTNode
    macro def dump_inspect(io, level) : Int32
      io << "\n" << "  " * level << {{@type.name}} #<< '\n'
      {% for ivar, i in @type.instance_vars %}
        {% unless {
            "call": true, # for recursion in Block..
            "a_lot_more_here_cut_out_for_the_example"
          }[ivar.stringify]  %}
        dump_prop @{{ivar}}
        {% end %}
      {% end %}
      0
    end
  end
end

For the first macro, there are no meta-code, so that would be no problem. Moving on to the dump_inspect macro we have a bit of meta. The {% for... and % unless... meta-code is indented, which creates an indent offset. You see dump_prop being used, in verbatim code (to be pasted in to final code) that would end up at 8 spc, while in real Onyx code it should be at 6 spc (aligned with io << "\n"... above). Of course, simply subtracting the meta-code indentation (at +2) gives 6 - so, as already mentioned, it's arithmetically easily solvable. Now instead imagine you're generating two different if-clauses depending on a meta-level condition. Now you have the if-part indented +2spc in the macro-code, inside the meta-if-condition. But the body (for this example) is invariable, so that will be indented because that's the correct level for verbatim code, thus the if and its' body will be at the same level of indent, which will look confusing (even though for the compiler it's just one subtraction away. Now, imagine several levels of meta-conditional generation of verbatim-if-conditions and it quickly becomes unmanagable for the brain to keep track of what indents there "really are".

So two options:

I personally think it is clear that this meta-coding-context is too distanced from normal code too be usable with indent-sensitivity.

The macro-meta-code follows indent rules as usual (it's just Onyx-code), but the contents are just treated as "arbitrary" string-data in a indent-vague context, which requires stating the expression-blocks range.

  1. In crystal you can define def-macros. These behave differently in that the parameters are the finally generated functions parameters and not parameters to the macro. As the name suggests, it's solely for generating funcs/methods.
  2. The compiler is implemented as simply a part of the standard library - this means you can include the compiler modules in your program and parse code and generate an AST from it. The most common (most maintainable) model of compilers is to first lex the code (created individual textually identifiable tokens. Then pass those to a parser which judges them based on the context and creates AST-nodes. The AST-nodes form a tree that represents the entire program. You can generate the whole program in source-code from this representation by visiting the entire tree and output strings in accordance with the features of the nodes. You can also work on it to re-factor, escape-analysis, validate symbols reference, type inference, what the hell you want, because it's in a computationally convenient tree. Now, that brings us to macros. By making a macro that runs an external program, you can pass the args of the macro to that, and parse them to AST in it, and manipulate the tree, instead of source-text, than render source code and return that as the result. This is what I want to be integrated more simply into Onyx, which would create the external program automatically from the macro code and do all the formalia. This means that for "true macros" (AST-macros). You get an AST-node (which is a tree in itself if it has sub-nodes) for each arg to the macro. How is this useful? Well, let's say you have a macro that requires an list- (array-) literal as argument. First off you can check that the node actually is an array. Then you can iterate the nodes and use for instance item 2 in the array in some specific way, etc. etc. You can do very exact and advanced manipulations of the arguments to the macro before finally generating code (which would be implicitly done too if you return an AST-node instead of a Str). To also be clear - you can do all checks and validations in meta-code in templates ("paste macros") too, the pro of working with AST is that you can generate resulting program code via an AST instead of creating strings which is far more powerful, and for complex macro cases also much simpler. It's much like coding in Lisp, so hopefully some in that crowd can leave that hideous syntax and jump to Onyx ;-)

Did this make it clearer?

To reiterate:

So the big question still. Actual syntax! C++ is the worst example ever (working with templates), since you can't code macros in C++. Crystal macro syntax is very unclear in my eyes. What it comes down to is simply delimiters - that's all.

Feels like I've forgotten something. Well, it will come around.

stugol commented 8 years ago

macro dump_prop

You know, it's quite the coincidence. I wrote pretty much that exact same function in gorillascript just the other day.

Shame gorillascript is dead. I'll have to translate it back into coffeescript at some point.

Now instead imagine you're generating two different if-clauses depending on a meta-level condition.

Nope, you've lost me again.

In crystal you can define def-macros.

I've never understood those. Could you explain further?

By making a macro that runs an external program, you can pass the args of the macro to that...

Why would I run an external program in a macro?

Did this make it clearer?

Not....as such, no ;)

macro would work with the AST for more powerful macro-work

Sounds very useful. But I don't yet understand it :(

I'd be happy to offer suggestions, once I understood it.

ozra commented 8 years ago

(You could always go with LiveScript [it's alive], you may or may not like it more than coffeescript. It has dash-identifiers, but I don't know much about GS for other similarities. Onyx draws a lot inspiration from LS ;-) )

Nope, you've lost me again.

Contrived psuedo-Onyx code (since there is no macro syntax yet):

template foo(x) =
   {! if x.of? StringLiteral !}
      if string-specific-check {= x =}
         x = string-specific-code-on x  -- note that this is indented 3spc (one indent)
                                        -- more than the rest of the body below
   {! else !}
      if generic-check {= x =}
   {! end !}
      do-things-inside-the-if-body x  -- note that the indent is the same for this as
                                      -- the if-condition-head
      right-here

   -- this is where a `end`-token would be required if so decided
   do-stuff-after-the-if-body

I've never understood those. Could you explain further?

The important difference is in what compile phase the macro is expanded. Macros generally are expanded as "dumb text", well, almost. "def-macros" are expanded at the end of type inference, where all nodes have been typed.

Why would I run an external program in a macro?

Many reasons and different uses. Perhaps you want to set version to a value gotten via git from tags, perhaps you want some data, or a json requested from some web-site freshly updated for each compile, or for this case to extend the compiler and mutate code in a program that returns the mutated code which is pasted in to "this program".

Just keep asking for clarifications as long as I'm not clear enough, I'm not always that good at explaining things.

stugol commented 8 years ago

You could always go with LiveScript

Sadly, I don't like its limitations:

Now instead imagine you're generating two different if-clauses depending on a meta-level condition. Contrived psuedo-Onyx code

I see. You're saying that it's not possible to make the indent level correct for both the macro and the resulting code. And it can't be worked out mathematically because the nesting could be generated in a different place to the indented content - and, indeed, the required nesting could vary depending on previous code generation:

{% if something %}
  if a
    if b
      if c
{% else %}
  if d
{% end %}
  content       -- cannot indent this correctly for both cases

The important difference is in what compile phase the macro is expanded. Macros generally are expanded as "dumb text"

So the first type of macro is expanded at the start of compilation; and the second type at the end, essentially? So the first type can affect the raw text of the code; whilst the latter type can't, but does have more information to work with?

I think I follow you now. Carry on.

ozra commented 8 years ago

(quick OT on LS here:

no way of doing stuff like for k, v in object

Say what? for .. in - just use for .. of: for k,v of object for own k,v of object - only hasOwn... properties

no string interpolation

Say what? It's been there forever: console.log "Hey #{some-var} - I'm interpolatin!".

check out www.livescript.net to ctrl-f the facts

end of side-note.)

ozra commented 8 years ago

Good example with the multiple level-if's variability, highlights the problem even better!

Delimiter thoughts:

Syntax suggestions highly welcome.

stugol commented 8 years ago

I don't know what node-pasting is, but {= ... =} looks fine to me. Looks a bit like the syntax for embedding Ruby in a HAML document.

I agree that variables should be fresh by default. It's just a question of how doable it is. But didn't you say a while back that the design of the language should not be dictated by the difficulty of implementation?

(Looking into LiveScript now. Thanks for the heads-up.)

ozra commented 8 years ago

Pasting is simply taking an argument to the macro and pasting it into the resulting code (though you can run meta code on the arg-node too, for instance do-stuff <{= literal-list-arg.join(",") =}>, would paste a literal-list arg as a tuple-literal (note: in this example with the newly proposed tuple syntax not yet decided on and implemented - which now is implemented, but will be ditched ;-) ).

Another alternative, perhaps, is double back ticks, it clashes less with regular code:

--  - discarded idea -
-- do-stuff <``literal-list-arg.join(",")``>

meta-code blocks should then mirror and perhaps use:

--  - discarded idea -
-- `% if some-condition %`
--    if true
-- `% else %`
--    if false
-- `% end %`
--       do-stuff
--    end

(No shell-execution code begins with %)

But didn't you say a while back that the design of the language should not be dictated by the difficulty of implementation?

Indeed, but impossible is a different beast of difficulty, I don't fling that word around lightly ;-)

stugol commented 8 years ago

The backtick operator can be overridden - for example, to yield HTML instead of shell execution - so we can't rely on anything being invalid inside backticks.

ozra commented 8 years ago

Good catch, was a bit quick on that one.

ozra commented 8 years ago

I've spent the last week's Onyx dev time on getting the internals set up for macroing. And continue on with it.

So, any bright ideas on macroing is of interest! Should something arise that needs a rethink before going through this monster completely.

stugol commented 8 years ago

Why does each translate to each_with_index???

onyx-extended variant where needed

???

ozra commented 8 years ago
  1. It's an experiment to make sure that translations can work in "cross-translation"-situations, if they need be, based on decisions for appropriate names in the stdlib later on, while still sharing the module universe with Crystal-community.
  2. Using the index along with the value is not too uncommon, so simply using it or not should just be the slip of an arg.
  3. It's just as fast (I've benchmarked it), LLVM optimizes away any unused code relating to the index when not used. Which makes an alternative func each-with-index an unnecessary inconvenience (just add an arg and there you go!).
  4. (1) is the only "hard reason" atm, since renamings is a later question and should be weighed holistically, not on a func by func basis isolated. It should all "fit together" "linguistically". But, to mention specifics individually anyway: "filter" seems better than "select", etc, however, most universally accepted namings are to be preferred imo. And that example I gave is just from my habits.

Some "pseudo-methods and -functions" are also in need of renaming imo. Consider typeof(x), x.class and x.is-a?(Type). typeof would be better off as decltype/typedecl or some similar wording, since it gives the declared type (even if inferred, which makes that wording a bit weird though), and class is not a concept in Onyx, currently, there's just "types". Crystal's "class" is just "reference type", and more importantly, it gives us the specific type a symbol currently holds, so would be better off like of-type, curr-type or somewhere along those lines. is-a? is misleading, since it matches super-types and mixed in traits too, it's currently called of? in Onyx. 1.of? AnyInt holds true for instance, as would 1.of? I32 | I64, or 1.of? Int which is the type the literal number is given by default (Int != Crystal-Int, the latter being called AnyInt in Onyx. Likewise Object is naturally simply called Any in Onyx.). Then again, I should have left this commentary out of this specific issue :-/

onyx-extended...

Crystal obviously don't know about Onyx specifics, so the Onyxisms not representable in Crystal must be implemented also on "Crystal-side" so that those constructs can be expressed and parsed when expanded in macros written in Crystal. :-O

Simple example: p for v in list: say v. Extremely contrived, but: p is a macro deffed in crystal stdlib. for doesn't exist as a concept in Crystal. In this specific case it's easily solved since for is re-written to each-looping, and the rewritten nodes are handled fine by crystal, but it highlights the problem in a good way.

stugol commented 8 years ago

2+3: Agreed. Although if each_with_index is not defined in the object, a bare each call should presumably just call the underlying each.

of-type? sounds good in place of is-a?. of? not so much. Suggest it accepts multiple arguments:

of-type?(&obj)(...types) ->
   types.any? ~> obj.class == @1 || obj.class.ancestors.include? @1

say "".of-type? Int, String     -- true

I still don't understand why Crystal macros need to have access to Onyx features.

Since when is x for y in z: say x valid code? And what the hell is p? I've never heard of it. Besides, how can a macro be written that accepts for v in list: say v as argument!?

ozra commented 8 years ago
  1. Yeah, that's a reasonable backup-route to take automatically.
  2. I'll leave the name discussions for its' own issue, as said: my bad for introducing it here.
  3. Any feature available in Onyx, but not in Crystal, must be able to render as "OnyxCrystal" for it to go through a crystal macro.
  4. Since forever. p is like say or puts pretty much, but a macro. You could use a plain func say for instance, to explain that syntax:
say for y in [1,2,3]: say y

You call say, with one argument, which is a for-expression. The for-expression is executed. It spits out 1, 2, 3 in order. Then it returns the iterated list. Which the first say spits out. Nothing strange. Output:

1
2
3
[1, 2, 3]
stugol commented 8 years ago

my bad

twitch

Please refrain from using that expression. It's the stupidest and most grammatically-incorrect Americanism I've ever encountered, and is most irritating.

3: But I don't understand why the macro needs to understand Onyx:

values = for v in list: say v
p values     -- macro `p` has no need of Onyx knowledge!
ozra commented 8 years ago

Haha, it's terse, I like it. I'll refrain from it to save to you a fit then B-)

my-macro values is not the same as my-macro for.... The macro gets the parsed arguments' actual AST-nodes passed in, that's kind of the point of a macro. They are expanded as needed in the macro, forming source code from the source "parts" already in the macro, forming correct (hopefully, if the macro was written right...) code. A for-expression would be parsed to a For-node (before rewrites that is), which is non-existent in Crystal (except, ironically, in macro-meta-code, which is a MacroFor-node).

stugol commented 8 years ago

Hm. So what's the solution?

ozra commented 8 years ago

Simple, just a little tedious: make syntax (can be ugly as fuck, doesn't matter) that won't clash with current (and preferably future, for less refactoring) crystal syntax so that crystal can be made to support the Onyx-constructs. Since this code will never be seen (it's just used for internal rendering/parsing), they can be made very verbose. So, just a bit more of coding to do, hehe. I might take another break from the branch to have a look at implementing the anonymous types issue in a few days.

ozra commented 8 years ago

It's now implemented to the stage where macros can be coded in Onyx (and more powerfully than in Crystal already, since asymmetry in generated 'if's etc. in meta-conditional macro-bodies is possible (as shown in different examples above). It's an early push, and I'll probably find a lot of bugs tomorrow B-)

ozra commented 8 years ago

BTW, regarding the delimiters. They're "good enough", and better delimiters could instead be chosen in the Unicode-realm.

ozra commented 8 years ago

I'm considering introducing a literal, "non hygienic" form of macro into Onyx also.

It would substitute the placeholder everywhere at parse-time, and as such be pretty much as "search n replace" in a text-editor, just barely smarter (avoiding comments and strings at least).

Will issue later on.

Sod-Almighty commented 8 years ago

Basically a C++ #define? Are you certain that is wise? They're one of the Three Most Crap Things™ about C++.

(the others being header files and templates)

ozra commented 8 years ago

Haha. Yes, I agree, it is a bad idea.. The idea is very loose so far, and iff needed, then there will be rules on hideously long names to avoid unnecessary mishaps. The only reason is for code-reuse in multiple init() -> and re-init() functions without causing "nilable" problems with ivar-type-inference.

Hopefully I figure out some cleaner way of avoiding above problems.

Bringing a shitty solution like this up publicly, can kick off the brain cells to come up with the right solution instead ;-)

Sod-Almighty commented 8 years ago

Multiple init?

ozra commented 8 years ago

Say: "reset"-functions with common code that sets ivars to a known state, that may be used from multiple constructors and also for re-initing an instance when using instance-pooling.

Sod-Almighty commented 8 years ago

I think you'll have to explain this a bit more. It's the first I'm hearing of any of this.

ozra commented 8 years ago

I've figured out what to do about it. The evil macros won't be necessary. Anyway, example of reason:

type Foo
  @a I32
  @b I32

  init() ->
    @a = 1
    reset

  reset() ->
    @b = 2
end

f = Foo()

The type inference for constructors is much dumber than the common inference - above won't compile, "@b can be nil". I'll hope for some helpful action from the crystal camp here, because it can be solved. I'll issue it there. Otherwise an alternative semantic phase will be added specifically for Onyx.