Idea: power lambdas - Githubissues

mustache / spec

The Mustache spec.

MIT License

364 stars 71 forks source link

Idea: power lambdas #135

Closed jgonggrijp closed 10 months ago

jgonggrijp commented 2 years ago

2023-10-31 update: the newest version of my proposal is here.

2022-10-21 update: the following excerpt from https://github.com/mustache/spec/issues/138#issuecomment-1286225095 summarizes how I currently envision power lambdas.

The way I currently think of it, lambdas receive a second argument which somehow (i.e., in an implementation-defined way) makes the following things possible:

retrieve any context frame;

retrieve a list of all keys visible in the current (full) context;

resolve any key against the current (full) context;

identify the context frame in which a key is resolved;

render a template of choice against the current context.

Whereby lambdas must not modify the pre-existing contents of the context, and implementations are welcome to actively prevent this if the programming language can enforce it. However, lambdas can (already) push a new frame on the stack, which still has the net effect of changing what's available in the context.

2022-10-16 update: added playground links and savestates.

This is a feature I've meant to implement for a while, and which I was planning to propose to the spec after implementing it. However, it seems relevant to the discussion about dynamic partials that is currently taking place with @anomal00us, @gasche and @bobthecow in #134 and #54, so I decided to describe the idea now already.

The context name resolution algorithm described in the current interpolation, section and inverted section specs states that if any name in a dotted path happens to refer to a function (method), that function is called in order to obtain the next frame from which to resolve the next name in the path (or the final result). In case of a section or inverted section, the function is also passed the contents of the section as a string argument, if it is capable of accepting an argument. Illustration with JavaScript lambda code:

{{#key.lambda.otherKey}}Hello {{name}}{{/key.lambda.otherKey}}

{
    key: {
        lambda(section) {
            // section will be the string 'Hello {{name}}'
            return {
                otherKey: { name: 'John' }
            };
        }
    }
}

Hello John

Try the above example in the playground by pasting the following code:

{"data":{"text":"{\n    key: {\n        lambda(section) {\n            // section will be the string 'Hello {{name}}'\n            return {\n                otherKey: { name: 'John' }\n            };\n        }\n    }\n}"},"templates":[{"name":"","text":"{{#key.lambda.otherKey}}Hello {{name}}{{/key.lambda.otherKey}}"}]}

The lambdas spec adds to this that if the last name in the path is a function and it returns a string, then that string is rendered as a template before finally being substituted for the interpolation or section tag.

{{#key.lambda}}Hello {{name}}{{/key.lambda}}

{
    key: {
        lambda(section) {
            // section will be the string 'Hello {{name}}'
            return 'I changed my mind about the template, {{name}}';
        }
    },
    name: 'John'
}

I changed my mind about the template, John

Try the above example in the playground by pasting the following code:

{"data":{"text":"{\n    key: {\n        lambda(section) {\n            // section will be the string 'Hello {{name}}'\n            return 'I changed my mind about the template, {{name}}';\n        }\n    },\n    name: 'John'\n}\n"},"templates":[{"name":"","text":"{{#key.lambda}}Hello {{name}}{{/key.lambda}}"}]}

I would like to suggest another optional extension spec on top of this, which I'll dub "power lambdas" for now. It amounts to passing an additional argument to functions that can accept it, containing a (read-only) view of the full context stack. I'll illustrate a couple of advanced features that could be implemented using power lambdas below. For now I'll assume that the context stack is passed as the most conventional sequence type of the programming language, with the top of the context stack as the last element. It could also be left to the implementation whether the stack frames are ordered top or bottom first.

Access to properties of lower context stack frames that have been hidden by higher frames:

{{#nested1}}
    {{#nested2}}
        {{name}}
        {{parentScope.name}}
        {{rootScope.name}}
    {{/nested2}}
{{/nested1}}

{
    parentScope(section, stack = section) {
        return stack[stack.length - 2];
    },
    rootScope(section, stack = section) {
        return stack[0];
    },
    name: 'John',
    nested1: {
        name: 'Lizzy',
        nested2: {
            name: 'Deborah'
        }
    }
}

        Deborah
        Lizzy
        John

Hash iteration by key-value pairs like in Handlebars's #each helper:

{{#nested}}{{#byKey}}
    The {{key}} is called {{value}}.
{{/byKey}}{{/nested}}

{
    byKey(section, stack) {
        var context = stack[stack.length - 1];
        var pairs = [];
        for (var key in context) {
            pairs.push({key, value: context[key]});
        }
        return pairs;
    },
    nested: {
        'police officer': 'Jenny',
        'doctor': 'Hildegard',
        'hairdresser': 'Tim'
    }
}

    The police officer is called Jenny.
    The doctor is called Hildegard.
    The hairdresser is called Tim.

Dynamic partials:

{{!template.mustache}}
{{animal}} goes:
    {{#dereference.>}}animal{{/dereference.>}}

{{!cow.mustache}}
Moo!

{{!dog.mustache}}
Woof!

{
    dereference(section, stack) {
        // This example presumes that there is a way to render a template with a prepared
        // template stack instead of just a plain view that will become the root of a new
        // context stack. Offering such an interface will be attractive for implementations
        // that support power lambdas.
        var name = renderMustache('{{' + section + '}}', stack);
        return {
            '>': function() { return '{{>' + name + '}}'; }
            // I'm oversimplifying here, you could do this for all sigils without code
            // duplication.
        };
    },
    animal: 'cow'
}

cow goes:
    Moo!

agentgt commented 10 months ago

its misleading. you wrote earlier {{#block}}..{{/block}} but its not the block, the block is {{$block}}..{{/block}} according to the docs

As far as nomenclature I agree with @jgonggrijp recommendations of naming and agree that "block" is confusing but this happens all the time with tech specs.

Your use of the word block would be confusing to most Mustache implementation developers.

What you call a block is what I call "section-like". As far as what Oxford says I don't think it is fruitful at this point as consensus has been met that "Section" is what it is. I don't think the manual is that bad on this regard. I just don't think a lot of confusion is had here but I'm sure @jgonggrijp will take PRs on better clarification. (if anything the word sigil which is not in the manual is probably the most confusing part :) ).

Speaking of which while that picture is incredible (mad kudos on that) what exactly you tokenize and how is not really part of the spec and there seems to be a lot of back and forth on that including splitting the content into two parts which btw is exactly what handelbars does.

I urge you to think of this in terms of backward compatibility and what other implementations may or may not have to do.

Why does Mustache need the extra end tag specified? Why does XML or HTML need it? Obviously they don't as you are right you can make rules that if you see the / it should be the end of the stack but Mustache was originally made to template HTML/XML so that is probably why it is like that. That is also where it gets the terms like "tag" from.

Anyway I am curious about the pipes @jgonggrijp. I sort of implement that as I mentioned early in my languages implementation with the regular dot.

Assume the top of the stack is given to the lambda then are these different?

{{#numbers.iterate}}
...snip...
{{/numbers.iterate}}

{{#numbers | iterate}}
...ship...
{{/numbers | iterate}}

@jgonggrijp I think one thing I forgot to mention in our previous conversations on lambda is that I make lambda tags at the end of a dotted path still resolve even if they are not children. Thus a lambda at the root of the object tree (which is always at the bottom of the stack!) called iterate would be found even if not a child of numbers. I can't recall if that is correct to spec or I put that in (tired).

That being said it looks like | fixes my previously mentioned "list" problem.

That is

{{#numbers.iterate}}
{{//numbers.iterate}}

In my implementation would execute the lambda iterate the amount of items in the numbers list.

So I like this

{{#numbers | iterate}}
....

Makes that possible aka pass the whole list!

determin1st commented 10 months ago

@agentgt

I expect the perf on most dynamic implementations particular in scripting languages aka REFLECTIVE to be about the same and about one order of magnitude difference which looks to be the case here.

Do you have the test code you used for the benchmark?

It's hard to make any sort of apples to apples comparison especially if threading comes into play and implementation details but here is a Java templating benchmarking test of various template engines. In Java we are talking nanoseconds here for template rendering: https://github.com/agentgt/template-benchmark (JStachio renders somewhat complex templates a million templates a seconds on so so hardware).

there's no threading in NODE or PHPs, they are single-threaded engines. NODE's JIT is very advanced and agressive, better than PHPs for now. tests are here: https://github.com/determin1st/sm-utils/tree/main/tests/mustache

clone entire repo. i did npm i mustache, npm i wontache, npm i hogan.js and other libs there. my version should work without any configurational magic. check speed.* files, the entire line of mustaches that compete in speed is in the shellscript (convert it into your OS variant). i run those with PHP 8.1 JIT enabled

opcache.jit=function
opcache.enable_cli=1

what speed test does:

loads JSON test files (some copied from this repo). some of them are modifed - there's a speed:0 inside, which disqualifies entire file from the test. also there are individual test exclusion checkmarks.
selects loop variant, for example comipile()+render() (STATIC as you say) or render() (REFLEXIVE!) or prepare() (FULLY DYNAMIC ACTION!)
selects a number of loop cycles 10 100 1000 10000
runs the loooop
prints one-liner with the time

i wish you luck rising above (completing faster) of any of those :] if there's a glitch or something you see as incorrect measurement - let me know, ill fix that. maybe drop your testfiles that you say are very complicated and run millions per second.

its misleading. you wrote earlier {{#block}}..{{/block}} but its not the block, the block is {{$block}}..{{/block}} according to the docs

As far as nomenclature I agree with @jgonggrijp recommendations of naming and agree that "block" is confusing but this happens all the time with tech specs.

Your use of the word block would be confusing to most Mustache implementation developers.

What you call a block is what I call "section-like". As far as what Oxford says I don't think it is fruitful at this point as consensus has been met that "Section" is what it is. I don't think the manual is that bad on this regard. I just don't think a lot of confusion is had here but I'm sure @jgonggrijp will take PRs on better clarification. (if anything the word sigil which is not in the manual is probably the most confusing part :) ).

well, english existed prior to mustache spec, so which consensus is more overwhelming? i put an equality sigil between {{$block}} and {{#section}} because they follow the same pattern, imo, more wording doesnt form consistency, the opposite. thus tags i see are {{$name}} and {{#name}}

Speaking of which while that picture is incredible (mad kudos on that) what exactly you tokenize and how is not really part of the spec and there seems to be a lot of back and forth on that including splitting the content into two parts which btw is exactly what handelbars does.

I urge you to think of this in terms of backward compatibility and what other implementations may or may not have to do.

if you run the test once perf you'll see fails=4, is where handlebars doesnt follow the spec (from this repo), so you should point exactly to the terms you think must or should be compatible.

Why does Mustache need the extra end tag specified? Why does XML or HTML need it? Obviously they don't as you are right you can make rules that if you see the / it should be the end of the stack but Mustache was originally made to template HTML/XML so that is probably why it is like that. That is also where it gets the terms like "tag" from.

in HTML, name tag or keyword or tag - matters, it says what type is the elment, what set of render rules applies. in mustache, tags refer to data, not render rules. type and rendering is determined differently. {{div}} doesnt need {{/div}} in mustache, it doesnt form a pair

agentgt commented 10 months ago

there's no threading in NODE or PHPs, they are single-threaded engines. NODE's JIT is very advanced and agressive, better than PHPs for now. tests are here: https://github.com/determin1st/sm-utils/tree/main/tests/mustache

By threading I mean overall throughput. I'm not sure on PHP but Node may only execute user code single threaded it uses multiple threads. So a more true test would be to load up a Node js server and hit it with some benchmarking tool.

The idea is to test if there is some sort of contention that may or may not be caused by how the JIT optimizes the code.

Speaking of which benchmarking across languages is incredibly hard. For one by the looks of it it looks like your benchmarking is not do any JIT warming up.

selects loop variant, for example comipile()+render() (STATIC as you say) or render() (REFLEXIVE!) or prepare() (FULLY DYNAMIC ACTION!)

No it isn't STATIC. Please go back and reread. Static requires the templating engine know the structure of the model. The spec tests (I think) which you are using for benchmarking are not constrained models other than JSON.

i wish you luck rising above (completing faster) of any of those :] if there's a glitch or something you see as incorrect measurement - let me know, ill fix that. maybe drop your testfiles that you say are very complicated and run millions per second.

I'm not a Javascript expert. I can assure if you reread my point on STATIC it would be more obvious on why implementations like that run faster.

maybe drop your testfiles that you say are very complicated and run millions per second.

And I said somewhat complicated.

Lets back up for a second. What do people do without templating languages?

They walk a data structure concatenating a string correct?

That is your test should at least include that naive but probably fastest implementation for both languages right (PHP and JS)?

Benchmarking templating languages across languages on a single thread IMO is fairly pointless as eventually you just hit whatever limit the language has on IO.... or your just testing JIT warming up which appears to be the case here.

EDIT when I say languages I don't mean other Mustache like languages. I mean programming languages.

I'll add some more flaw with the benchmark is you need to test non-latin1 characters. Most programming languages are highly optimized for ASCII or latin1 but in the real world people use things like emojis or other languages (speaking/written languages) besides English.

JStachio does special optimizations on this front by pre-encoding the static parts of the template into UTF-8 bytes. This is because encoding strings into bytes is a non-trivial cost and if you can do that upfront it can make a large difference.

I seriously doubt any of the Javascript implementations or even yours does this and part of this is because why would they need to since they just hand off a string.

However for a true real world benchmark it should matter hence why I say you need to load up some server or something similar where you get multiple threads and output going on.

If you want to take that dive you can check out techempower benchmark framework and modify it to only do HTML output: https://github.com/TechEmpower/FrameworkBenchmarks

n HTML, name tag or keyword or tag - matters, it says what type is the elment, what set of render rules applies. in mustache, tags refer to data, not render rules. type and rendering is determined differently. {{div}} doesnt need {{/div}} in mustache, it doesnt form a pair

Your overall contention with wording of the spec and what the community has formed consensus around regardless of what Oxford says is just not that germane to power lambdas which is the issue of this thread. I recommend making a new issue that the wording/terminology/jargon could be improved. BTW Mustache was designed by the founder of github who is American and in America the standard for US English is Merriam Webster and not Oxford (I mean if we are going to be contentious and pedantic about this let us use the right dictionary 😄 ).

agentgt commented 10 months ago

To get us back on track of power lambdas I am really liking the | notation as it disambiguates a regular old school lambda call from the newer power lambdas. It also fixes the list problem (aka do not iterate but just take this node (top of the context stack) regardless of whether it is a boolean or a list.

In my implementation I have annotations doing the disambiguation but in other languages that maybe much harder to determine particularly languages without typing and much more variable arguments.

@jgonggrijp Do you have any wontache branches or work experimenting with this | notation?

I would like to propose if we do | then we make it so that each power lambda that is doing filtering or whatever returns essentially a two tuple of

[ node, template ]

node here is a new object that is now pushed onto the stack and template is the template like a partial that will be rendered with the new context stack (when I say new I just mean the full stack but with node now on top of the stack.

Whether or not a lambda actually returns a tuple or sets some values on an input object is implementation detail.

That is in Javascript it might be:

function(helper) {
  var top = helper.top; // top of stack.
  var someNodeThatHelpsRenderPerhaps = doLogic(top);
  helper.push(someNodeThatHelpsRenderPerhaps);
  helper.template('{{#.}}before{{> *}}after{{/.}}'); // iterate over list. 
}

function(top) {
  var someNodeThatHelpsRenderPerhaps = doLogic(top);
  var template = '{{#.}}before{{> *}}after{{/.}}';
  return [someNodeThatHelpsRenderPerhaps, template];
}

Again {{> *}} here just means include the section body (the content between what you call blocks @determin1st) like a partial.

If template is missing then the section body is the template.

I'm using JS as the linga franca to show how I think it should work.

Of course I don't know what should happen if someone accidentally does:

{{#context.powerLambda}}
{{/}} {{! to placate @determin1st }}

But perhaps that is just a detail that the spec does not need to cover.

EDIT I guess where it also gets really tricky is if multiple piping is allowed.

{{#context | lambda1 | lambda2 }}
some section
{{/}}

Does lambda1 just not get to participate in template creation? e.g. lambda2 just gets the new top of the node stack generated from lambda1 or something more exotic?

Or do we just limit one |?

EDIT 2:

For @jgonggrijp and @determin1st if you are wondering how JStachio does this and how others can do this fast is that when the template gets compiled it gets compiled with the model and templates are guaranteed to be static.

Thus the above is more like:

const template = magic('{{#.}}before{{> *}}after{{/.}}');
function(top) {
  var someNodeThatHelpsRenderPerhaps = doLogic(top);
  return [someNodeThatHelpsRenderPerhaps, template];
}

That is JStachio can walk the complete object graph and template apriori and what it generates and this is important @determin1st to the whole STATIC is a giant single function in Java (actual Java code).

So a root template that is

{{#lambda}}
some contents {{node}}
{{/lambda}}

With lambda with a static template of:

before 
{{> * }} 
after

Gets sort of preprocessed into:

{{#lambda}}
before
some contents {{node}}
after
{{/lambda}}

That is in JStachio there is no compiling of partial templates separately. It is truly like an include.

The code that gets generated is sort of like (translated to Javascript):

function(buffer) {
  var current = ... 
  var result = lambda(current);
  buffer.append('before\n');
  buffer.append('some contents ');
  buffer.append(escape(result.node.toString()));
  buffer.append('\n').append('after');
};

The context stack in JStachio is more like the programming lexical stack of a function (aka lexical scoping).

And because Java is statically typed there is none of this:

var result = lambda(current);
var wrapper = resolve(result); // is it a list, or a string or whatever.
// do stuff with wrapper.

That is why while it would be fun to go crush your benchmark with a statically compiled version using JStachio I don't need to as I'm fairly sure even a Javascript version that does not have to introspect the type would be faster.

However the major rule to make this all work is that templates cannot be created dynamically.

determin1st commented 10 months ago

By threading I mean overall throughput. I'm not sure on PHP but Node may only execute user code single threaded it uses multiple threads. So a more true test would be to load up a Node js server and hit it with some benchmarking tool.

what throughput? its a loop over templates in the memory (RAM).

The idea is to test if there is some sort of contention that may or may not be caused by how the JIT optimizes the code.

yes, JIT gives a boost, aggressiveness of JS is seen at 1000 => 10000 transition. it doesnt multiple by 10, while PHPs does. but it doesnt influence the result.

Speaking of which benchmarking across languages is incredibly hard. For one by the looks of it it looks like your benchmarking is not do any JIT warming up.

those arent JIT tests, those are templating engine tests.

No it isn't STATIC. Please go back and reread. Static requires the templating engine know the structure of the model. The spec tests (I think) which you are using for benchmarking are not constrained models other than JSON.

The link was to a Cambridge dictionary not Oxford. The engine obtains knowledge by parsing the template and doing the first render, so it becomes static. otherwise, what it becomes, what category?

I'm not a Javascript expert. I can assure if you reread my point on STATIC it would be more obvious on why implementations like that run faster.

i know why it runs faster, i did some speedtest :] why need to be javascript expect, i dont understand

Lets back up for a second. What do people do without templating languages?

They walk a data structure concatenating a string correct?

That is your test should at least include that naive but probably fastest implementation for both languages right (PHP and JS)?

they write render functions by hand. i started to write error log renderers but soon realized that an upgrade to my mustache implementation would be more appropriate. i bet the performance outcome of handwritten renderers would be similar to preset template render (name it static).

Benchmarking templating languages across languages on a single thread IMO is fairly pointless as eventually you just hit whatever limit the language has on IO.... or your just testing JIT warming up which appears to be the case here.

EDIT when I say languages I don't mean other Mustache like languages. I mean programming languages.

then abolish your words that your implementation is the fastest :] folks benchmark languages all the time. having the same system api you aquire two timestamps and substract them, no rocket science in there.

I'll add some more flaw with the benchmark is you need to test non-latin1 characters. Most programming languages are highly optimized for ASCII or latin1 but in the real world people use things like emojis or other languages (speaking/written languages) besides English.

JStachio does special optimizations on this front by pre-encoding the static parts of the template into UTF-8 bytes. This is because encoding strings into bytes is a non-trivial cost and if you can do that upfront it can make a large difference.

I seriously doubt any of the Javascript implementations or even yours does this and part of this is because why would they need to since they just hand off a string.

yes, i dont think its needed. otherwise, show some example, some use case to probe.

However for a true real world benchmark it should matter hence why I say you need to load up some server or something similar where you get multiple threads and output going on.

If you want to take that dive you can check out techempower benchmark framework and modify it to only do HTML output: https://github.com/TechEmpower/FrameworkBenchmarks

just load some JSON files into the memory and loop over them, why servers clusters frameworks dockers rockers.. its unnecessary for a simple test. i think adding to the shellscript something like

:: JMustachious superstatic
java speed.java 1000

will do the job

Your overall contention with wording of the spec and what the community has formed consensus around regardless of what Oxford says is just not that germane to power lambdas which is the issue of this thread. I recommend making a new issue that the wording/terminology/jargon could be improved. BTW Mustache was designed by the founder of github who is American and in America the standard for US English is Merriam Webster and not Oxford (I mean if we are going to be contentious and pedantic about this let us use the right dictionary 😄 ).

BEHOLD: https://www.merriam-webster.com/dictionary/section a distinct part or portion of something whats something? something like a block. im bashing currents to improve.. collect fruits for my own implementation and suppose you acting similar. if you want to adopt something named "filter" right into the core, no problemo, do as you wish, but dont forget to put a ` single empty space in my implementation to invoke that craziness. definitely you may later split on|as much as possible, but i myself prefer,` commas to separate things in a line.

if you are wondering how JStachio does this and how others can do this fast is that when the template gets compiled it gets compiled with the model and templates are guaranteed to be static.

Thus the above is more like:
const template = magic('{{#.}}before{{> *

probably i wasnt clear enough that my brain apparatus turns off on mustache symbols >*< or words like "partial". its a twilight zone for me, i wont be able to catch up with those. also you introduced a node without definition.. i really didnt understand the rest part.

agentgt commented 10 months ago

I am sorry I just don't think this talk of performance is fruitful especially how much that varies and we are talking about a templating language across different programming languages.

Benchmarking is all lies and I'm just saying my lies are better because I take more things into account.

then abolish your words that your implementation is the fastest :] folks benchmark languages all the time. having the same system api you aquire two timestamps and substract them, no rocket science in there.

Because that is not the real world at all. You can't do micro benchmarking like that. I just can't continue explaining when again it isn't even germane. How about I retract my statement that my Mustache implementation is the fastest?

probably i wasnt clear enough that my brain apparatus turns off on mustache symbols >*< or words like "partial". its a twilight zone for me, i wont be able to catch up with those. also you introduced a node without definition.. i really didnt understand the rest part.

BEHOLD: https://www.merriam-webster.com/dictionary/section a distinct part or portion of something whats something? something like a block.

I don't care. This is mustache not @determin1st special language. I'm trying to work with the constructs of the existing spec without doing drastic changes of which includes not adding {{|}} or not believing in partials or whatever makes your implementation happy.

I don't care if we call sections ASDFWEFQWEFAEWF.

im bashing currents to improve.. collect fruits for my own implementation and suppose you acting similar.

No I'm not. I'm trying to help you or at least was and I'm trying to improve the spec without getting caught in fairly non important things for what is supposed to be a minor extension.

All I have seen over and over is I do this shit in my language that is completely different than Mustache because integers of sections blah blah blah what I think is performance etc.

With the exception of Lambda enhancement JStachio follows the spec. It is really hard because your implementation appears to be all over the place.

As for your benchmarks I don't have the time to try to get your PHP scripts to work running Java. That is why I pointed to Techempower because you need a whole loading framework to do that kind of testing. You don't have to load up databases but you will probably need docker.

And yeah man I speak from fuck loads of experience including powering some extremely how traffic sites so I am arrogant on this and I apologize but you asked what I thought of your perf tests.

Since you have the time why don't you go look at this and port over to PHP/JS instead: https://github.com/agentgt/template-benchmark . Oh that isn't a benchmark I came up with. It is used all over for Java templating engines and I can tell you it isn't ideal precisely because it isn't real world.

I still have no idea how your implementation works other than some preprocessing. Does it use reflection or do you expect a model of specific types passed to the rendering engine. Not clear but I suppose it really doesn't matter at this point.

i really didnt understand the rest part.

ditto for almost all your comments. I just have hard time with your writing style. It is probably my fault as my reading comprehension sucks.

I just honestly don't have the patience like @jgonggrijp who is a saint. So I may just page out for a while till we go back to how we can get power lambdas in.

Advance apologies.

jgonggrijp commented 10 months ago

@determin1st

"tag pair" is irrelevant, it can be {{#hello}}..{{/world}} now as i changed the pairing algorithm

Yes, I'm aware that your implementation is different, but this is why I'm trying to establish common vocabulary with you. It would be nice if we could use the same words and mean the same things.

its misleading. you wrote earlier {{#block}}..{{/block}}

Did I? Where?

but its not the block, the block is {{$block}}..{{/block}} according to the docs

That was exactly my point.

naah, each section has its own type, so they arent only content. branches grow from the tree, one from another. blocks arent trees, they are lists. i simply go to the dictionary: https://dictionary.cambridge.org/dictionary/english/section one of the parts that something is divided into

That may be, but Section already has a different meaning in the spec, so you're still going to need a different word. Maybe "segment"?

Block's terminator: the spec would call this "End Section Tag" if referring to the whole tag or "slash" if referring only to the sigil.

okay, but section doesnt always end with the {{/}} it may end at the start of another section.

That is your terminology. I was talking about the spec's terminology. Any common terminology will have to be non-conflicting with the spec.

did some performance mesurements, @agentgt you may be interested in those

What is the difference between the two bottom wontache rows? They have wildly different numbers.

@agentgt

(...) but I'm sure @jgonggrijp will take PRs on better clarification.

Yes I will!

I urge you to think of this in terms of backward compatibility and what other implementations may or may not have to do.

I second this @determin1st.

Why does Mustache need the extra end tag specified? Why does XML or HTML need it? Obviously they don't as you are right you can make rules that if you see the / it should be the end of the stack but Mustache was originally made to template HTML/XML so that is probably why it is like that. That is also where it gets the terms like "tag" from.

It is not only because of HTML, and HTML/SGML/XML is not the only language besides Mustache that does this. For example, many shell languages do for-endfor, case-esac and if-fi. In C and C++, you see include guards following the voluntary convention #ifndef XYZ-#endif /* XYZ */. In many programming languages with Pascal-like (word-based) syntax, such as Ada, you can also do if-end if and the like. The motivation in all these cases is explicitness. The information is technically redundant, but it helps the human brain parse what is going on. Many technologies intentionally incorporate redundancy, even if the redundant data are never seen by human eyes; it makes technology more robust in general as well.

Assume the top of the stack is given to the lambda then are these different?
{{#numbers.iterate}}
...snip...
{{/numbers.iterate}}

{{#numbers | iterate}}
...ship...
{{/numbers | iterate}}

Yes, these are different. According to spec, the first example can only work if iterate is a member of numbers, and iterate can only access numbers through an implicit this or self binding (if the programming language and the implementation support that).

There is no official spec for the second example yet, but as @bobthecow described it, iterate can be anywhere on the context stack and it will receive numbers as its explicit, first and only argument.

@jgonggrijp I think one thing I forgot to mention in our previous conversations on lambda is that I make lambda tags at the end of a dotted path still resolve even if they are not children. Thus a lambda at the root of the object tree (which is always at the bottom of the stack!) called iterate would be found even if not a child of numbers. I can't recall if that is correct to spec or I put that in (tired).

The spec forbids it quite explicitly. Here's the quote from interpolation.yml; there is similar phrasing in sections.yml.

https://github.com/mustache/spec/blob/59be9b26043fcc4c68bc6dee41abbb8c9207d971/specs/interpolation.yml#L19-L22

That being said it looks like | fixes my previously mentioned "list" problem.

(...)

Makes that possible aka pass the whole list!

As I wrote before, I like it, too.

@determin1st

(...) english existed prior to mustache spec, so which consensus is more overwhelming?

English may be older, but it is a natural language, so it is hopelessly vague and ambiguous. "Section" can also mean "to cut open a dead body" and "block" can also mean "cuboid piece of wood". In practice, you can make words mean anything you want.

In the specification of a computer language, we are trying to achieve the exact opposite: very specific and completely unambiguous. There can be only one meaning of "block" and only one meaning of "section".

i put an equality sigil between {{$block}} and {{#section}} because they follow the same pattern,

Only syntactically. They have different semantics; a block is always rendered exactly once, though its contents may be overriden from the outside. A section may be rendered zero or more times, depending on the value that the key resolved to.

imo, more wording doesnt form consistency, the opposite. thus tags i see are {{$name}} and {{#name}}

Specifically, a tag is anything that starts with {{ and ends with }} (or other delimiters if you overrid those).

in HTML, name tag or keyword or tag - matters, it says what type is the elment, what set of render rules applies. in mustache, tags refer to data, not render rules. type and rendering is determined differently. {{div}} doesnt need {{/div}} in mustache, it doesnt form a pair

They are more similar than you think, but you need to realize that in Mustache, the sigil (!#^$><& or empty for variable) takes the role that div or span would take in HTML. The content of the tag (the text between the sigil and the closing delimiter) is a like a single tag attribute in HTML. The only real difference between Mustache and HTML is that in a closing tag, Mustache repeats the attribute while HTML repeats the sigil.

@agentgt

Speaking of which benchmarking across languages is incredibly hard. For one by the looks of it it looks like your benchmarking is not do any JIT warming up.

I'm trying to stay out of the benchmarking discussion, but here I just wanted to mention that I've written a JavaScript-only template engine comparison benchmark because of Wontache. It's not in a user-friendly form yet, but it does take JIT warmup into account.

https://gitlab.com/jgonggrijp/wontache/-/tree/integration/comparison

If you want to run it, you need to

Make a local clone of the entire Wontache monorepo
Run yarn and yarn comparison build
Run the comparison.js in a JavaScript engine of choice. I think it works in Node.js out of the box. I have also successfully run it in browsers, but I used a quick-and-dirty HTML file for that which I haven't even committed yet.

I seriously doubt any of the Javascript implementations or even yours does this and part of this is because why would they need to since they just hand off a string.

However for a true real world benchmark it should matter hence why I say you need to load up some server or something similar where you get multiple threads and output going on.

Here I need to point out that JavaScript strings are always UTF-16 and there isn't really anything JS programmers can do about it.

Next post by you:

To get us back on track of power lambdas I am really liking the | notation as it disambiguates a regular old school lambda call from the newer power lambdas.

The way I've thought about it so far, pipes/filters are actually distinct from lambdas and power lambdas. A pipe always receive a context value as argument, a lambda never does (although power lambdas have a backdoor to retrieve that information and much more). A lambda might return a replacement template (or override the template in other ways if it is a power lambda), while a pipe can only return a new context stack frame.

Old lambdas and pipes can both do things that the other cannot. Power lambdas would be able to do everything that old lambdas and pipes can do and much more, but at the cost of being more complicated to use.

That being said, if you can think of a way to combine pipes and power lambdas into a single thing, I'm certainly interested.

(You continued to suggest what might be called "power filters". It looks interesting but also very complicated. I'm undecided whether it is better than just having filters and power lambdas as two separate features.)

@jgonggrijp Do you have any wontache branches or work experimenting with this | notation?

No, but I plan to do that eventually.

EDIT I guess where it also gets really tricky is if multiple piping is allowed.
{{#context | lambda1 | lambda2 }}
some section
{{/}}
Does lambda1 just not get to participate in template creation? e.g. lambda2 just gets the new top of the node stack generated from lambda1 or something more exotic?

Or do we just limit one |?

Plain pipes/filters as described by @bobthecow can be chained as much as you want, because it is just one value in, one value out. For example, a hypothetical {{#list | nonEmpty}} could be split into {{#list | length | nonZero}}. I'm not sure I understand why this gets complicated in the "power filter" variant that you described, but that might be a weak point of the idea.

For @jgonggrijp and @determin1st if you are wondering how JStachio does this and how others can do this fast is that when the template gets compiled it gets compiled with the model and templates are guaranteed to be static.

(...)

However the major rule to make this all work is that templates cannot be created dynamically.

This is something where power lambdas might also benefit static implementations (and dynamic ones that care about precompilation). The magic object might provide a way to override the section contents with a template that was already compiled ahead of time (instead of having to return it as a string) and/or a way to override the contents with a fixed string, without passing any more template processing over it.

Also, to return to the wrapping use cases that we discussed before: the magic object could also provide ways to add a prefix and/or suffix to the section.

@determin1st

probably i wasnt clear enough that my brain apparatus turns off on mustache symbols >*< or words like "partial". its a twilight zone for me, i wont be able to catch up with those. (...)

Do you mean there is nothing we can do about this? Would you accept help?

@agentgt

I just honestly don't have the patience like @jgonggrijp who is a saint.

Ha! Thanks. ❤️

So I may just page out for a while till we go back to how we can get power lambdas in.

The discussion is definitely going everywhere. I'm accepting it, because the side tracks tend to be interesting and I honestly don't know how discussions about the Mustache language could be structured such, that each topic is neatly organized in a separate thread. I have considered enabling GitHub Discussions because of this very issue ticket, but as far as I can tell from the documentation, there is no way to move individual posts to different threads, so I would just be spreading the problem over two places.

Anyway. I'm all in for discussing how to spec power lambdas again. I will write a new post after this one so it will be easier to link to it.

jgonggrijp commented 10 months ago

Here is the latest state of my ideas about power lambdas and how to spec them. I believe that specification should follow after some implementations already exist and that the spec should be checked for realism with respect to implementations. I have not implemented any power lambda feature myself (yet), so take this description with a grain of salt.

Power lambdas would essentially be a collection of features to supercharge the capabilities of lambdas, while changing as little about the language as possible. There is no syntactic change and only one semantic change: lambdas that can handle it receive an additional argument, which I refer to as the "magic".

The magic adds one or more of the following features to the existing capabilities of lambdas, in an implementation-defined way. Implementations are free to omit some of them, but encouraged to support as many as they can.

Retrieve a list of all keys that are visible in the context stack.
Look up the value of a given key in the context stack. Ideally, it must also be possible to see which context frame underlies the value that was found.
Retrieve any frame in the context stack (intended as read only).
Override the content with an external template that was already compiled beforehand (rather than returning a string that will be compiled or interpreted as a template on the fly).
Override the content with a final string that is not processed further (rather than returning a string that will be compiled or interpreted as a template). For sections, this string will not be escaped; if the string needs to be escaped, the power lambda should do it.
Render a template of choice against the current context stack and obtain the result as a string, without it necessarily being interpolated.
Define a prefix and/or suffix, in order to wrap whatever ends up being rendered in the place of the interpolation tag or section. This can currently already be done in section lambdas by concatenating the prefix, section content argument and prefix and returning the resulting string as a new template, but this feature makes it possible for interpolation tags as well and removes the necessity to compile or interpret a template at runtime. Implementations may support static strings, template strings or precompiled templates as prefix/suffix, or at their option, multiple of those.

I envision this as an optional extension module, perhaps ~power-lambdas.yml. It would follow a slightly different format from the existing ~lambdas.yml: each implementation that attempts to pass the specs in that module will need to bring its own lambdas.

agentgt commented 10 months ago

I like the proposal. I guess I would challenge whether lambdas need the whole stack. I'm biased as that part (giving the lambda the whole stack) is more challenging for my implementation. I guess what lambdas do you see needing the whole stack?

I suppose it can be an implementation option. I'm just afraid all the À la carte options might be hard to document or express in the manual.

Going back to dotted notation for a second on the lambda. We talked earlier how I cheat sort of with:

{{#node.lambdaNotChildButSomeWhereUpTheStack}}
{/node.lambdaNotChildButSomeWhereUpTheStack}}

Part of that is because lambdas in practice just aren't really data like they are expressed in the manual. Also I can just say lambdaNotChildButSomeWhereUpTheStack is hanging off every node magically and therefore not in violation of spec (also if there was field with that name it would take precedence).

That is the use case I have seen is one more of Handlebars where they are globally registered like utility methods which in a Mustache sense would be them hanging off the root node (bottom of stack, unnamed) or to make dotted work easier appear to be hanging off every (child) node like what I do (EDIT the lambda search still goes up the stack with dotted but it doesn't then go down another branch looking for the lambda so it is not a global search).

I wonder how other implementations are registering lambdas and if it makes sense to explore that?

Because if the case is for most implementations lambdas are more local aka hanging off deeply nested nodes than why do they need the entire stack when they inherently know about the stack?

To use boring software engineering logic why or how would some deeply nested component know or need the entire stack when it really should not be that omniscient EDIT and if it did need it could just manually walk up the object graph internally (like several examples I believe in JS where this is used)?

It seems to me a real challenge for a naive mustache user is:

get access to lambda
pass to lambda correct part of stack

The first one I tried to make a little easier in my implementation. Because of that first part being successful passing the correct part of the stack is easier and thus the lambdas do not need the entire stack.

I have to wonder if needing the entire is stack is because of that first part and or registration being part of the data

determin1st commented 10 months ago

@agentgt

Since you have the time why don't you go look at this and port over to PHP/JS instead: https://github.com/agentgt/template-benchmark . Oh that isn't a benchmark I came up with. It is used all over for Java templating engines and I can tell you it isn't ideal precisely because it isn't real world.

ive checked that. its not real world you say because it doesnt match with itself https://github.com/agentgt/template-benchmark/blob/utf8/src/test/resources/expected-output.html i took this template https://github.com/agentgt/template-benchmark/blob/utf8/src/main/resources/templates/stocks.mustache.html open both and see that initial padding dont match, head tag in template is tabbed. if you fix that, still, the render wont match the expected, https://github.com/agentgt/template-benchmark/blob/a679c4f2989a34f3eb75f84282da176d42c60354/src/main/java/com/mitchellbosecke/benchmark/JStachioNoLambda.java#L75C79-L75C79 puts class="minus" without space and there is more indentation that dont match. so ye, i fixed it and converted into php.. but with one testfile, i dont feel the need for the loop and measurement. it should be multiple different tests and multiple files with them. the UTF-8 characters dont constitute a problem for templating engine because it doesnt play with characters, it does substitutions. when the source and the target have the same character encoding, there will be no transcodings.. so i dont include it into my tests, here is the fixed template version with run.php

jstachio

(it looks like a broken image, download and rename it to .7z archive)

@jgonggrijp

Yes, I'm aware that your implementation is different, but this is why I'm trying to establish common vocabulary with you. It would be nice if we could use the same words and mean the same things.

I’m not convinced by the argument that "our grandpas did it this way." (it is in the spec), because our great-grandpas also did things (english). im interested in reducing mental capacity required for templating

its misleading. you wrote earlier {{#block}}..{{/block}}

Did I? Where?

maybe not, im totally out of mustache {{$block}} syntax, so i got confused.

What is the difference between the two bottom wontache rows? They have wildly different numbers.

as wontache doesnt have names, it basically a full render every cycle (compile()*render()*N) and singe compile() pass and then render() every cycle (compile()+render()*N).

agentgt commented 10 months ago

@determin1st JStachioNoLambda was a test without using lambdas and was less optimal so I didn't care if it really matched output. The output actually during unit tests strips all whitespace. Why the spacing is missing is because I was doing what Mustache.java did for that test. Mustache.java is obviously cheating but it is such a minor cheat that I left it.

Here is the unit test that strips the white space: https://github.com/agentgt/template-benchmark/blob/a679c4f2989a34f3eb75f84282da176d42c60354/src/test/java/com/mitchellbosecke/benchmark/ExpectedOutputTest.java#L112

The reason that benchmark does not care about whitespace is because some templating engines alter that. There are several that are not even remotely like Mustache.

I still can't download your archive even changing the extension. Maybe just put a repo or something instead of file. But yeah if you are not looping it is really going to be slow because Java does have an insanely slow startup time compared to scripting languages so it takes quite a long time before that initial startup cost is mitigated.

Now I could native compile the Java with GraalVM native to do your single iteration but I don't care to.

I tried to explain on the readme in that project why it matters for Java doing pre-encoding in UTF-8. How the fuck do you think the templates get written? I don't care if JS or PHP doesn't allow you to write raw bytes. In Java it is allowed and is what actually happens over the wire.

Oh and speaking of output or flawed output just letting various language implementations write to memory... as a benchmark test. Yeah its flawed. They can do all sorts of hacks like pre-allocating or using direct memory access etc etc. That is why you need to test the output... which is why IO gets into the picture.

But WTF do I know I'm grandpa (in great irony I just turned 43 so I feel like one 😄).

To avoid clogging this thread yet again can you just file a bug or do a discussion here: https://github.com/agentgt/template-benchmark

I just want to keep it focused on power lambdas.

agentgt commented 10 months ago

the UTF-8 characters dont constitute a problem for templating engine because it doesnt play with characters, it does substitutions. when the source and the target have the same character encoding, there will be no transcodings.. so i dont include it into my tests, here is the fixed template version with run.ph

This is so incredibly naive that I have to respond. In Java and Javascript because their string representation is UTF-16 (more or less ignoring fringe difference) there is always transcoding every time something is written out. What is happening in several of the Java templating engines is parts of the template that are not substitutions aka the static parts are turned into UTF-8 bytes in advance. If you are just writing everything below 256 then yes no conversion has to happen because ASCII is a subset of most encodings.

I know you don't care in your world about bytes but it makes a difference in my world.

Now I don't know what PHP does for characters but I decided to be new school and ask Chat GPT.

PHP uses a variable-length encoding for its string representation, and it does not use UTF-16. Instead, it primarily uses a single-byte encoding for strings (ASCII or ISO-8859-1) or a variable-length encoding like UTF-8 to handle a wide range of characters from different languages.

UTF-16 is a fixed-width encoding, where each character is represented by two bytes. In contrast, UTF-8 is a variable-width encoding, which uses 1 to 4 bytes to represent characters. PHP's choice of using UTF-8 for its internal string representation is more memory-efficient and compatible with a broader range of character sets and languages compared to UTF-16.

PHP's multibyte string functions and the mbstring extension can be used to work with multibyte character encodings like UTF-8 effectively. However, at the core, PHP stores strings as sequences of bytes, and it's up to developers to handle character encoding and decoding as needed for their specific use cases.

So does your implementation like natively deal with UTF-8 bytes? Otherwise someone is doing the transencoding for you correct? Also chat gpt appears to conflict with this: https://www.php.net/manual/en/language.types.string.php#language.types.string.details where it says it's basically just single bytes... based on the encoding of the PHP file ... holy fuck.

Given that PHP does not dictate a specific encoding for strings, one might wonder how string literals are encoded. For instance, is the string "á" equivalent to "\xE1" (ISO-8859-1), "\xC3\xA1" (UTF-8, C form), "\x61\xCC\x81" (UTF-8, D form) or any other possible representation? The answer is that string will be encoded in whatever fashion it is encoded in the script file. Thus, if the script is written in ISO-8859-1, the string will be encoded in ISO-8859-1 and so on. However, this does not apply if Zend Multibyte is enabled; in that case, the script may be written in an arbitrary encoding (which is explicitly declared or is detected) and then converted to a certain internal encoding, which is then the encoding that will be used for the string literals. Note that there are some constraints on the encoding of the script (or on the internal encoding, should Zend Multibyte be enabled) – this almost always means that this encoding should be a compatible superset of ASCII, such as UTF-8 or ISO-8859-1. Note, however, that state-dependent encodings where the same byte values can be used in initial and non-initial shift states may be problematic.

So I can see why you don't care or hand it off. So maybe your implementation somehow avoids transencoding because its all UTF-8 bytes but in Java and Javascript that is not the case.

BTW given your apparent greenness the encoding of the original template (not PHP) is the only encoding you can use. That is whatever template is on the filesystem whatever it was encoded with is the only encoding you can serve unless that encoding is UTF-16 (which for files is rarely the case). That is transencoding from latin-1 to UTF-8 can cause corruption. That is why almost everyone uses UTF-8 everywhere except in memory.

However if your templates are in memory or languages literal string you can disregard that.

Anyway again I am trying to help you because you seem incredibly smart but possibly lacking experience. Perhaps too smart and overconfidence I think is setting in.

Ultimately, this means writing correct programs using Unicode depends on carefully avoiding functions that will not work and that most likely will corrupt the data and using instead the functions that do behave correctly, generally from the intl and mbstring extensions. However, using functions that can handle Unicode encodings is just the beginning. No matter the functions the language provides, it is essential to know the Unicode specification. For instance, a program that assumes there is only uppercase and lowercase is making a wrong assumption.

So does your implementation use mbstring stuff or do you just assume single byte?

agentgt commented 10 months ago

@jgonggrijp btw that above discussion while distracting and in vain it does have merit to my points on the pitfalls of user concatenation in lambdas.

If a Mustache template is not the same encoding as the PHP library expects by default (because the PHP library is entirely written in UTF-8 files and assumes all content is) or the PHP library does handle various encoding and someone writes a lambda they sure as hell better not use the builtin concatenation but mbstring (or whatever handle multi-byte concat). I might be wrong on the above logic.

BTW @determin1st normal PHP templates that just echo out work just like what the pre-encoding I'm doing in Java because the string literals are already UTF-8 (which you can see is done in this techempower benchmark): https://github.com/TechEmpower/FrameworkBenchmarks/blob/b5a1618d5cca236cbf996d696833b944b738c262/frameworks/PHP/php-ngx/app.php#L80

You can't do that because you are reading templates from other sources I think? Or you just assume everything is UTF-8 and you are actually manipulating bytes and characters and it is happening automatically for you. That is you already get pre-encoding and you do not have to transencode so if you were doing a benchmark test of output a Javascript engine would have to do more work than a naive everything is UTF-8 PHP implementation.

jgonggrijp commented 10 months ago

@agentgt

I like the proposal.

Glad to hear that!

I guess I would challenge whether lambdas need the whole stack. I'm biased as that part (giving the lambda the whole stack) is more challenging for my implementation. I guess what lambdas do you see needing the whole stack?

This may be superfluous for you, but I feel I need to mention this so that someone reading along will not be confused: I carefully omitted any feature that would enable the lambda to obtain the entire context stack at once. The proposal implies full stack access, but only indirect and piecemeal (although an implementation might still choose to actually provide the whole object in its raw form).

With that out of the way, here are the use cases that motivated my inclusion of features 1, 2 and 3.

Firstly, consider the following data (a.k.a. view),

{   // the lowest level is the basement
    fruit: 'apple',
    ground: {
        fruit: 'banana',
        loft: {
            fruit: 'cherry',
            roof: {
                fruit: 'date'
            }
        }
    }
}

and the following Mustache template.

{{#ground.loft.roof}}
{{fruit}}
{{/ground.loft.roof}}

This will render with the roof fruit, date, and usually, this is what is intended. However, what if you want to use the roof as the current context but still access the fruit at one of the lower floors? Mustache currently has no way to do this. Handlebars does:

{{#ground.loft.roof}}
{{fruit}} {{!date}}
{{../fruit}} {{!cherry}}
{{../../fruit}} {{!banana}}
{{@root.fruit}} {{!apple}}
{{/ground.loft.roof}}

Feature 3 would allow us to define power lambdas that achieve a similar effect:

{{#ground.loft.roof}}
{{fruit}} {{!date}}
{{parent.fruit}} {{!cherry}}
{{grandparent.fruit}} {{!banana}}
{{root.fruit}} {{!apple}}
{{/ground.loft.roof}}

Internally, the parent lambda might do something like return magic.getContextFrameFromTop(1).

If you have feature 3, features 1 and 2 are technically redundant, so they could be considered conveniences. I originally came up with those features in order to enable things like list nonemptiness checks, first/last checks and access, and name-value iteration over objects. I now believe those use cases are better served by filters, but I wouldn't rule out the possibility that someone comes up with another good use case.

I suppose it can be an implementation option. I'm just afraid all the À la carte options might be hard to document or express in the manual.

Good point. I guess the manual could just illustrate two or three examples of things that would be possible with power lambdas, without explicitly mentioning the seven features. It should then tell users to refer to the documentation of their implementation for details.

Going back to dotted notation for a second on the lambda. We talked earlier how I cheat sort of with:
{{#node.lambdaNotChildButSomeWhereUpTheStack}}
{/node.lambdaNotChildButSomeWhereUpTheStack}}
Part of that is because lambdas in practice just aren't really data like they are expressed in the manual.

You mean lambdas aren't data in your implementation or in any implementation? In either case, why not?

Also I can just say lambdaNotChildButSomeWhereUpTheStack is hanging off every node magically and therefore not in violation of spec (also if there was field with that name it would take precedence).

Better just be honest that your implementation is doing something nonstandard. 😉 Please don't take this as a value judgement, though. Portability between implementations is a feature, and this is why we standardize. However, deviating from the standard is still fine, as long as it is a deliberate choice and you're honest about it. If your users know it, they can decide for themselves which is more important to them: the portability (for this edge case), or the feature that you are giving them in return (I'm guessing that is performance and type safety).

That is the use case I have seen is one more of Handlebars where they are globally registered like utility methods

Right, the helpers.

which in a Mustache sense would be them hanging off the root node (bottom of stack, unnamed)

Coincidentally, that is a planned feature of Wontache as well. Or rather, a sub-root node. I was planning to call it the "ambient context". It wouldn't be restricted to lambdas, but it would probably be most useful for lambdas, anyway.

I wonder how other implementations are registering lambdas

The spec implicitly assumes that either the host language has first-class functions or the implementation supports method-bearing objects as context stack frames. For the original Ruby implementation, as well as for all existing JS implementations, both of those assumptions hold true. I suspect this is the case for most implementations in dynamically typed programming languages. I'm not familiar enough with Mustache implementations for statically typed languages to comment on how they approach it.

Because if the case is for most implementations lambdas are more local aka hanging off deeply nested nodes than why do they need the entire stack when they inherently know about the stack?

Let me reverse that question: why would a lambda at the root of the context stack already be aware of the entire stack?

To use boring software engineering logic why or how would some deeply nested component know or need the entire stack when it really should not be that omniscient EDIT and if it did need it could just manually walk up the object graph internally (like several examples I believe in JS where this is used)?

this will get you only one frame; most languages do not have a recursive this.this. Also, it is important to keep in mind that a frame in the stack need not be a member of the frame below it (if we consider the root the bottom). Consider this contrived example:

{
    name: 'John',
    mother: {
        name: 'Joan'
    },
    father: {
        name: 'Jon'
    }
}

{{#name}}{{#father}}{{#mother}}{{#name}}{{#name}}{{#father}}
{{/father}}{{/name}}{{/name}}{{/mother}}{{/father}}{{/name}}

At its climax (by the end of the first line), the context stack for the above template will contain, from root to top:

the input data
the name "John"
the father object
the mother object
the name "Joan"
the name "Joan" (repeated, this is not a typo)
the father object

Since the layering of the context stack is determined by the structure of the template, it is impossible for a lambda to know it beforehand. This is true regardless of whether the lambda is high or low in the context stack.

It seems to me a real challenge for a naive mustache user is:

get access to lambda

pass to lambda correct part of stack

The first one I tried to make a little easier in my implementation. Because of that first part being successful passing the correct part of the stack is easier and thus the lambdas do not need the entire stack.

I have to wonder if needing the entire is stack is because of that first part and or registration being part of the data

These are important questions. Have my answers so far shed light on them for you?

@determin1st

I’m not convinced by the argument that "our grandpas did it this way." (it is in the spec), because our great-grandpas also did things (english).

I will answer that by repeating earlier words of mine:

English may be older, but it is a natural language, so it is hopelessly vague and ambiguous. "Section" can also mean "to cut open a dead body" and "block" can also mean "cuboid piece of wood". In practice, you can make words mean anything you want.

In the specification of a computer language, we are trying to achieve the exact opposite: very specific and completely unambiguous. There can be only one meaning of "block" and only one meaning of "section".

im interested in reducing mental capacity required for templating

For you, sticking to your favorite interpretations of "section" and "block" might require the least mental capacity. For everyone who is already familiar with the spec, however, the opposite is true.

What is the difference between the two bottom wontache rows? They have wildly different numbers.

as wontache doesnt have names, it basically a full render every cycle (compile()*render()*N) and singe compile() pass and then render() every cycle (compile()+render()*N).

Thanks. So if I understand the numbers correctly, by your measurement, Wontache has the (second?) slowest compilation but the fastest rendering among the JavaScript implementations.

@agentgt

(...) How the fuck do you think the templates get written? I don't care if JS or PHP doesn't allow you to write raw bytes. (...)

This is so incredibly naive that I have to respond. (...)

BTW given your apparent greenness (...)

Anyway again I am trying to help you because you seem incredibly smart but possibly lacking experience. Perhaps too smart and overconfidence I think is setting in.

Please don't write comments like these. They can come accross as hostile or condescending and they do not help the discussion. Let others be as they are and stick to the contents.

agentgt commented 10 months ago

Please don't write comments like these. They can come accross as hostile or condescending and they do not help the discussion. Let others be as they are and stick to the contents.

I apologize. My patience has just not been there lately.

I will say swearing is less hostile where I'm from and more for emphasis.

When I have to go back and forth contextual switching from power lambdas and PHP and trying to run bat files from windows to run PHP I lost my patience.

The great irony is I do care and I want to help them and the project but my passion took over my filter.

I am sorry.

Let others be as they are and stick to the contents.

As a critique I think you need to perhaps help it more so that we get less off track. I felt like I was trying to do this more than I had to in other forums/discussions/lists. I mean that kindly. Exploration is good but focus can be better.

jgonggrijp commented 10 months ago

You are probably right that I could do more to keep the discussion focused. I am easily sidetracked in general. Would you have a suggestion on how to approach this?

agentgt commented 10 months ago

I’m not sure at the moment as I’m still stunned and embarrassed rereading my comments.

Let me get back to you shortly on that.

@determin1st I’m sorry. It was incredibly rude of me.

jgonggrijp commented 10 months ago

While you think about it, I will toss some discussion management ideas that I just came up with myself. Feedback welcome.

When a discussion goes sideways, anyone, but especially a maintainer like myself, should suggest an appropriate other place to continue the side track.

Issue tickets should be used to discuss the existing spec. For everything else, such as implementation issues, performance measurements and ideas for new language features, I'm considering to open the GitHub Discussions feature.

Discussions that went everywhere, like the current one, would be locked. The individual topics could be resumed in new, focused threads.

determin1st commented 10 months ago

@agentgt i dont treat myself as your fragile user who needs personal help or attention or lessons or apologies. im kind of expressed it earlier in "bashing" and "extracting fruits" sentence. i really dont need users or fans.

Here is the unit test that strips the white space: https://github.com/agentgt/template-benchmark/blob/a679c4f2989a34f3eb75f84282da176d42c60354/src/test/java/com/mitchellbosecke/benchmark/ExpectedOutputTest.java#L112

The reason that benchmark does not care about whitespace is because some templating engines alter that. There are several that are not even remotely like Mustache.

im not good with english but i know that "does not care about whitespace" and "strips the white space" arent compatible.

Oh and speaking of output or flawed output just letting various language implementations write to memory... as a benchmark test. Yeah its flawed. They can do all sorts of hacks like pre-allocating or using direct memory access etc etc. That is why you need to test the output... which is why IO gets into the picture.

implementations that are compatible can be compared. apples can be compared with apples. yours does apples vs oranges. the area where syntax and feature comparison plays a role, not performance. im not persuaded with this suggestion:

{{#-index.isEven}}...{{/-index.isEven}}

mine's better:

{{#isEven index}}...{{/}}

This is so incredibly naive that I have to respond. In Java and Javascript because their string representation is UTF-16 (more or less ignoring fringe difference) there is always transcoding every time something is written out. What is happening in several of the Java templating engines is parts of the template that are not substitutions aka the static parts are turned into UTF-8 bytes in advance. If you are just writing everything below 256 then yes no conversion has to happen because ASCII is a subset of most encodings.

better respond with a test, some template and data to feed the engine and get the problem.

I still can't download your archive even changing the extension. ...to be new school and ask Chat GPT. Also chat gpt appears... BTW given your apparent greenness the encoding of the original template (not PHP) is the only encoding you can use. That is whatever template is on the filesystem whatever it was encoded with is the only encoding you can serve unless that encoding is UTF-16 (which for files is rarely the case). That is transencoding from latin-1 to UTF-8 can cause corruption. That is why almost everyone uses UTF-8 everywhere except in memory.

im too green for chatgpt. at least i know how to download an image from the internet, pressing CTRL+S in the browser. im not asking why you put template data into Java sourcefile. i dont get what's behind this topic.. yes, PHP parses and interprets UTF8 sourcefiles. very good language.

So does your implementation use mbstring stuff or do you just assume single byte?

if you give me a use case or an example, ill start using it right away. maybe i need to start using an image rendering extension or server cluster, but what i have for now is enough.

@jgonggrijp

{{#ground.loft.roof}}
{{fruit}} {{!date}}
{{../fruit}} {{!cherry}}
{{../../fruit}} {{!banana}}
{{@root.fruit}} {{!apple}}
{{/ground.loft.roof}}
this is interesting but without /. single dot - current (0), two dots - previous (-1), three (-2). as a backpedal that ive implemented for _ helper value prefix.

When a discussion goes sideways, anyone, but especially a maintainer like myself, should suggest an appropriate other place to continue the side track.

Issue tickets should be used to discuss the existing spec. For everything else, such as implementation issues, performance measurements and ideas for new language features, I'm considering to open the GitHub Discussions feature.

Discussions that went everywhere, like the current one, would be locked. The individual topics could be resumed in new, focused threads.

im okay with one big mixed wall of messages.

agentgt commented 10 months ago

@jgonggrijp

These are important questions. Have my answers so far shed light on them for you?

Yes. I kind of minced my words on the whole bottom of the stack node not needing the stack. I think I meant the reverse 😄

As for lambda being data. Yes from an academic sense. It all just keeps coming back to the object tree and context stack.

Where does that data come from? Probably from a REST call or a database. I don't know modern backend JS that well but I'm guessing the data from a database is probably JSON.

In Java it is usually objects from an ORM.

Regardless both cases probably do not have lambdas on the object tree right? Probably have to transform the data or mutate right (to add the lambdas)?

I'm not entirely sure how other implementations register lambdas but in Java lambdas are less data like. Java like most statically typed languages is also less mutable.

So what ends up happening particularly in something like Mustache.java is you transform the entire object tree which is quite painful.

In Javascript I presume one can just literally mutate the model and add the lambdas.

So I guess what I mean by the bottom nodes in the tree (well graph) is if they are being transformed or mutated they can a lambda shoved onto them with the node they want to access but I supposed that could happen at any node so I'm not sure why I made the distinction.

Anyway I am curious if you know which implementations besides Ruby and Javascript are doing lambdas in a more advance way than the spec. I want to do some reconnaissance to understand what is happening in other languages and how much transformation is happening.

Because ultimately you can just transform the data till your template works but I think that is the pain point (power) lambda is to fix.

@determin1st

at least i know how to download an image from the internet, pressing CTRL+S in the browser.

https://user-images.githubusercontent.com/16524081/279394091-1e584e11-40df-47fc-8efd-cb5f80150896.jpg

The Link for whatever reason did not work at first. It gave an XML error w/ or w/o changing extension. It worked a little while later after my comment. I was using CURL btw. I'm not sure if it was CDN issue or what.

jgonggrijp commented 10 months ago

@determin1st

im okay with one big mixed wall of messages.

I can relate. As I wrote before, I personally don't mind if a discussion makes some detours, as long as the detours are interesting. However, I can also imagine that some people find this distracting or annoying. @agentgt is probably not the only person with a preference for discussions that stay on topic. Besides, if we keep discussions focused, it will be easier to find back relevant parts of discussions later.

Are you also OK with more focused discussions? We can still discuss everything, just in multiple separate tickets/threads. The GitHub Discussions feature does support side threads, so we can also compromise a little.

agentgt commented 10 months ago

Yeah I prefer more focus. The thread is so long it breaks safari on my iPhone (I try to use Firefox but sometimes accidentally use iPhone safari).

I will say I get heated more because I am business owner and my biz depends on Handlebars/Mustache both Javascript and Java implementations. I can assure you how I use Mustache is not just for fun exploration but is actually relied on to generate revenue and pay folks. That being said I care about the greater community and want all boats to rise.

Context switching is painful for me. Also everyone's time is important. Perhaps its a failure of github UI but scrolling through and deciphering is extremely challenging (well for me). And also it is painful when someone just is using a spec forum just to get ideas for their own very much different implementation and disrupt the continuity of thought without share or care of others.

For example I still don't even know what was being proposed with this:

{{#lambda someArg}}

At one point it was a string and then another it was resolved off the stack. Which would have been fine exploration but so many other things popped up.

What I would really like to see is how other Mustache implementations try to solve Mustache limitations so @determin1st ideas were interesting but then we got carried away with so much syntactical differences and english language semantics. I'm just not smart enough to keep all of it in my head.

The other problem is I wish more implementations provided more documentation. I feel like I have done my best on that front. https://jstach.io/jstachio/

Wontache documentation is also very good but many are not. Having a conversation of Wontache is easier for me as I understand JS, it has documentation, and even a playground.

Implementations like @determin1st that do not even have the code available make it very challenging to get correct context.

On a separate note I really like the idea of | being a filter.

That is perhaps power lambdas should be less complex and we add filters as well.

In my implementations | (aka filter or mapper or combinator insert name) will fix the list problem and I may even add it shortly on an experimental branch.

With that addition (and virtual nodes that I guess sort of the break spec) we have achieved almost everything we need w/ the exception of dynamic templates.

Also we added handlebars to our platform awhile back and other than the {{else}} folks for whatever reason avoided much of the other advances it brought because of confusion and complexity and fragility.

I'm concerned that power lambda might become that if we make it too complex so maybe multiple features will make the cognitive load less than one big feature?

EDIT I guess what I'm saying on the two features is we could somehow make | be used to locate and/or transform the stack and then a power lambda would be use for rendering but it only gets top of stack?

Apologies if the above was covered which I think it was and I just can't find the right comment.

jgonggrijp commented 10 months ago

@agentgt

As for lambda being data. Yes from an academic sense. It all just keeps coming back to the object tree and context stack.

Where does that data come from? Probably from a REST call or a database. I don't know modern backend JS that well but I'm guessing the data from a database is probably JSON.

In Java it is usually objects from an ORM.

Regardless both cases probably do not have lambdas on the object tree right? Probably have to transform the data or mutate right (to add the lambdas)?

Yes. While it is left implicit in the manual and the spec, in past discussions this has been considered "not a problem". In fact, most implementers seem to hold the opinion that the data should be transformed before feeding it to the template. This is also why the data is commonly called the "view"; it is a presentation-oriented transformation of some underlying model.

Personally, I agree that there is value in having an intermediate level of abstraction between the raw model and the template. At the same time, I appreciate not having to fiddle much with the data before feeding it into the template.

(...)

In Javascript I presume one can just literally mutate the model and add the lambdas.

So I guess what I mean by the bottom nodes in the tree (well graph) is if they are being transformed or mutated they can a lambda shoved onto them with the node they want to access (...)

Even in JavaScript, there are limitations. For example, you cannot just attach a method to a number or a boolean. It can be done by object-wrapping a primitive value first, but that is still rather unconventional and not something most JS programmers will like to do.

Anyway I am curious if you know which implementations besides Ruby and Javascript are doing lambdas in a more advance way than the spec.

Near the beginning of this discussion, @bobthecow described the power features in Mustache.php. If I understood correctly, he implemented at least feature 6 (render an arbitrary template against the current context stack and obtain the result as a string), but not feature 1 or 3 (full read access to the context stack).
Mustache.php also implements filters, though I would consider that just "different" from lambdas rather than "more advanced". I mention it for completeness.
In https://github.com/mustache/spec/pull/147#issuecomment-1590249650, @spullara mentioned having "more powerful lambdas" in Mustache.java. He didn't clarify what exactly makes them more powerful, but I suppose that could be found in the documentation.

I want to do some reconnaissance to understand what is happening in other languages and how much transformation is happening.

Because ultimately you can just transform the data till your template works but I think that is the pain point (power) lambda is to fix.

If we can suffice with shoving lambdas in a bottom frame, and we can avoid having to attach lambdas in strange places just so they can access a particular value, that is a win for even the most dynamic and liberal languages. Filters and/or power lambdas would certainly help with that.

That said, I think we should avoid trying to completely eradicate intermediate data preparation. By expecting the user to write preparation logic, we can keep the template language small and elegant.

agentgt commented 10 months ago

If we can suffice with shoving lambdas in a bottom frame, and we can avoid having to attach lambdas in strange places just so they can access a particular value, that is a win for even the most dynamic and liberal languages.

EDIT I misread that whoops! I was thinking top of stack. You mean lambda is top of the object graph.

e.g.

{
utilLambda : function...,
model: ... rest of data
}

That said, I think we should avoid trying to completely eradicate intermediate data preparation. By expecting the user to write preparation logic, we can keep the template language small and elegant.

I agree and do not want to eradicate transformation aka data preparation.

I think going back utility lambdas aka handlebar helpers what I have seen in the field is the need for cross cutting lambdas. Probably most often is i18n which I cover in my implementations documentation here https://jstach.io/jstachio/#faq_i18n

That is mapping and data preparation is fine but at some point with cross cutting things like i18n it gets difficult.

So going back to my concern of utility lambdas being possibly the norm instead of specialized ones (the theory and in practice witness is the local stuff just gets transformed) are you thinking we allow dotted parent paths e.g. ../../.

I guess how does one get access to say an i18n lambda in a new power lambda call (this might have been answered earlier and or I thought I knew)? Thanks for patience. I'm getting there on understanding what you are proposing.

EDIT your saying power lambdas would probably be placed on the top of the object graph and thus would (ignoring name collision) be accessible everywhere?

Just a radical thought but what if... power lambdas just do not live on the stack?

jgonggrijp commented 10 months ago

@agentgt

On a separate note I really like the idea of | being a filter.

That is perhaps power lambdas should be less complex and we add filters as well.

[good arguments]

I agree. If we get filters in the spec, at least features 1 and 2 (list of all keys, individual key lookup) don't need to be part of power lambdas. Those also happen to be somewhat expensive features.

Ironically, I had just started a similar train of thought after I wrote "keep the template language small and elegant".

EDIT I guess what I'm saying on the two features is we could somehow make | be used to locate and/or transform the stack

To locate values in the stack, yes.

and then a power lambda would be use for rendering but it only gets top of stack?

I would like to at least retain the existing lambda features, i.e., (1) read the unrendered section content and (2) either push a new frame on the stack or override the template in the section. My proposed features 4-7 make it possible to do both at the same time and give more flexibility in how to approach the override. By itself, I think that would be a well-rounded extension to the template language.

As for what information a power lambda should be able to access, I'm less clear:

Wontache currently allows a lambda to access the frame it is attached to, through the this binding. There are probably more implementation that do something like this. I think this should be left out of the spec, because not all programming languages have a this or self mechanism and because users can always replicate such functionality with a closure or a partial application. In fact, this is basically just the object-oriented flavor of a closure.
For the sake of simplicity, there is something to say for keeping it at that. In other words, we could argue that the lambda should not be able to access anything other than whatever it was bound to by the user.
On the other hand, being able to retrieve any context frame, as in my proposed feature 3, would make it possible to be selective about which frame a key should be resolved from. I have not yet seen any other proposal that would address this use case. I think it also makes sense for this to be part of the lambda capabilities, as lambdas can already push new context frames. I'm open to moving this use case to a different language feature, i.e., separate from both lambdas and filters, but I'm not sure what that feature should look like. I guess an extension to dotted paths could work? Handlebars's notation with ../.. and @root doesn't sit well with me, though.
Giving a lambda access to only the top context frame seems rather specific and perhaps a bit arbitrary. Did you have a particular use case in mind for that?

Apologies if the above was covered which I think it was and I just can't find the right comment.

Maybe you mean https://github.com/mustache/spec/issues/135#issuecomment-1788108214, where I gave reasons for including features 1-3. Over there, I already acknowledged that 1 and 2 are not strictly necessary, but I still thought they might be useful. At this point, I'm starting to think they should probably just be left out. Ironically, that means I'm gradually moving to the opinion that @bobthecow already defended at the beginning of this discussion.

jgonggrijp commented 10 months ago

@agentgt

I think going back utility lambdas aka handlebar helpers what I have seen in the field is the need for cross cutting lambdas. Probably most often is i18n which I cover in my implementations documentation here https://jstach.io/jstachio/#faq_i18n

Yes, that is a primary motivation for me as well. I wrote handlebars-i18next and I want to write something like that for Wontache as well, but I cannot do it in a satisfying way without power lambdas.

EDIT your saying power lambdas would probably be placed on the top of the object graph and thus would (ignoring name collision) be accessible everywhere?

Yes

Just a radical thought but what if... power lambdas just do not live on the stack?

That would be another extension that I would also be in favor of. I previously coined the term "ambient context" for this (which could possibly not only contain lambdas but, for example, also global constants). However, since lambdas can (and must) currently live inside the stack, I think it would make sense to still allow that as well. In fact, I can imagine situations where it actually makes sense for a lambda to exist as a nested member somewhere, with it being available in some sections but not in others.

jgonggrijp commented 10 months ago

@agentgt and @determin1st I will return later today and then probably start enabling the Discussions feature and reorganizing the past discussions. You can keep replying here in the meanwhile, but I recommend copying every post to the clipboard before submitting, in case I lock the ticket while you're typing.

agentgt commented 10 months ago

@jgonggrijp

Yeah I'm finally grokking what you have cooking. I will be unavailable for several days (in case you are waiting for a response) but I think we are on track!

Perhaps we start with some markdown document (separate from the existing spec or included does not matter for me) and work from there or do you think you/we still need more discovery?

jgonggrijp commented 10 months ago

@agentgt If you feel like writing a markdown document, by all means do so. I wouldn't be inclined to take that as a next step myself, but I will happily read it and contribute to it if you do.

By the time a new feature makes it into the spec, I think ideally the following things need to have happened:

People thought about it and discussed it. That's what we have been doing here, and I think this would also include your Markdown document.
Someone already implemented and demonstrated the feature. Preferably, there is already more than one implementation and end users have been using it as well.
Someone wrote the formal specification and it passed multiple reviews. Implementations pass the tests, at least to the extent that they are considered exemplary of the feature as intended.

I would like to develop a clear idea about what to do with frame selection and then write an initial implementation of power lambdas in Wontache (with or without frame selection). My implementation will remain fluid until a specification for it has been merged, so I can inform and adjust my implementation based on discussion and your Markdown document.

jgonggrijp commented 10 months ago

As I previously announced, I enabled GitHub Discussions and created new places for the topics that we have discussed here. Hopefully, we will be able to keep those new places more focused.

The following discussions take off where this discussion (and some others) left:

@agentgt and @determin1st, if you would like to continue discussing template engine benchmarks, I encourage you to start a new discussion for that in General.

Show and tell is the ideal place for discussions about the ins and outs of specific implementations such as JStachio and sm-mustache.

I will now close and lock this ticket, to prevent it from becoming even longer and harder to disentangle.

mustache / spec

Idea: power lambdas #135

153

154

155

156