marko-js / marko

A declarative, HTML-based language that makes building web apps fun
https://markojs.com/
MIT License
13.24k stars 643 forks source link

Proposal: Replace HTML parser with a new parser that recognizes attribute types #90

Closed patrick-steele-idem closed 8 years ago

patrick-steele-idem commented 9 years ago

Currently, Marko is using an HTML parser. An HTML parser treats every attribute value as a String type regardless of how it is written.

For example, with an HTML parser the following lines are all equivalent:

<my-component foo=123/>
<my-component foo="123"/>
<my-component foo='123'/>

As a result of this limitation, Marko relies on custom tag schemas to associate a type with an attribute value or the developer must use the ${<javascript-expression} syntax. Tag schemas are great for documentation but they are a pain to maintain and they make usage of a custom tag ambiguous. Not to mention, the ${<javascript-expression} syntax can only be used with attributes that have a string type (not expression, boolean, etc.).

The author of the <my-component> tag can separately declare the type for an attribute and developer viewing a template wouldn't know the attribute type unless you consulted the tag schema: It would be great if the parser used by Marko treated all attribute values as JavaScript expressions. Spaces will end an attribute value unless the space is in a quoted string or within a parenthesis/square brackets/curly braces block. Finally, we would support ${<javascript-expression>} usage within ES6 Template strings (i.e. strings "quoted" using backticks). The following illustrates what is being proposed.

<my-component number=1/>
<my-component variable=name/>
<my-component complex-expression=1+2/>
<my-component complex-expression-with-spaces=(a + b)/>
<my-component simple-string="hello"/>
<my-component simple-string='hello'/>
<my-component dynamic-string=`hello ${name}`/>
<my-component string-concatenated=("hello "+name)/>
<my-component boolean=true/>
<my-component array=[1, 2, 3]/>
<my-component object={hello: 'world'}/>
<my-component function-call=data.foo()/>
<my-component complex-function-call=(data.foo() + data.bar())/>

Now that we are using a custom parser, we can also improve the syntax for conditionals and looping as @onemrkarthik suggested:

<div if(x > 5)>
   Do something
</div>

<ul>
   <li for(color in colors)>
       ${color}
   </li>
</ul>

<ul>
   <li for(color in colors; status-var=loop)>
       ${loop.getIndex+1}) ${color}
   </li>
</ul>

Here's what for-loops and conditionals would look like as tags:

<if(x > 5)>
    <div>
        Do something
    </div>
</if>

<ul>
    <for(color in colors)>
        <li>
            ${color}
        </li>
    </for>
</ul>

<ul>
    <for(color in colors; status-var=loop)>
        <li>
            ${loop.getIndex+1}) ${color}
        </li>
    </for>
</ul>

Thoughts? Concerns?

SunnyGurnani commented 9 years ago

How about any JavaScript Object and JavaScript Arrays ?

my-component dynamic-int="${1==1? 25 : 26}"

viviangledhill commented 9 years ago

I think it's a good idea to remove the noise of extra files. Making this part of the (marko-)html makes sense. :+1:

@SunnyGurnani I assume that should also work as Patrick mentioned support for the ${...} syntax.

patrick-steele-idem commented 9 years ago

Hey @SunnyGurnani, as for JavaScript arrays and objects, we should make sure the following works as well:

<my-component array=[1, 2, 3]/>
<my-component object={hello: 'world'}/>

I'll updated the proposal. Thanks for the feedback

pswar commented 9 years ago

If you want to consider attribute value as JS expression, would you also support calling functions?

<my-component count=results.count() />
patrick-steele-idem commented 9 years ago

Hi @pswar, good question. Yes, any valid JavaScript expression is allowed on the right-hand side and will be copied to the compiled JavaScript template verbatim. I'll update the proposal to make this clear. Thanks for the feedback

philidem commented 9 years ago

Do you think that for parsing simplicity we should require expressions to be wrapped with parentheses? If you can get it work without parentheses that would be great but I could see how that might be tricky.

For example, consider:

<my-component object=someFunc (1) />

That space after someFunc might look like a new attribute to the parser.

This would be safer in all cases:

<my-component object=(someFunc (1)) />
patrick-steele-idem commented 9 years ago

Hey @philidem, I see where you are coming from, but requiring parenthesis would actually make things more confusing I think. It would be a bad idea to require that variable names be wrapped in parenthesis like the following:

<my-component variable=(foo) />

The following is better:

<my-component variable=foo />

In this case, foo just happens to be a simple variable expression (not a function call).

I think using a space (outside of a code block) is the best way to delimit attributes.

vedam commented 9 years ago

Hi, maybe it's worth considering the upcoming ES6 Template-Strings. https://babeljs.io/docs/learn-es2015/#template-strings

If the proposal is a thought about getting the parser more towards js, the following:

<my-component dynamic-string='hello ${name}'/>

would lead towards:

<my-component template-string=`hello ${name}`/>

(yes, the backticks) if I got it right. Not to talk about multiline strings. And we'll be able to omit string-concatenation

<my-component string-concatenated=("hello "+name)/>

or am I completely wrong greets Achim

patrick-steele-idem commented 9 years ago

Hi @vedam, you bring up a good question regarding the use of backticks for template strings. Adopting backticks would make things more like JavaScript, but my initial reaction is that why not allow number quoted strings to be treated as template strings? We are processing all of these strings at compile-time to find the dynamic parts so there is no performance hit at runtime. Also worth mentioning is that we already support dynamic parts in normal quoted strings so switching to backticks would make things potentially problematic for developers migrating to the new version (migration is also a very important topic that needs to be discussed since this proposal would definitely be a breaking change). The only negative about not using backticks is that developers would need to escape the $ character inside quoted strings if it is not intended for a dynamic part. I'm leaning towards not supporting backticks, but I would like to get more thoughts from others since I think it is definitely worth discussing some more.

vedam commented 9 years ago

@patrick-steele-idem, I understand the migration-argument completely. For me as a total marko-noob, my thoughts were totally js-driven. It's the language I'm most familiar with, loving the new possibilities and features of the upcoming es6. And with something like these...

<my-component string-concatenated=("hello "+name)/>
<my-component complex-function-call=(data.foo() + data.bar())/>

...I have to leave my way of js-thinking towards some marko-specials. If everything after the "=" is familiar js I don't have to think about marko. It just works. Maybe this kind of thinking is too naive.

patrick-steele-idem commented 9 years ago

Hey @vedam, I'm actually okay with supporting backticks, but I would want to transpile the code to its ES5 equivalent so that it works in older browsers that do not support that ES6 feature.

For example,

`foo ${name}`

Would be transpiled to the following:

'foo ' + name

I think the end result would be the same. I agree with your sentiment that the right-hand should just be JavaScript...this makes Marko more intuitive and more powerful.

If we agree to support backticks, then the question becomes: should we also support dynamic parts in normal quoted strings or should dynamic parts only be allowed in ES6 template strings?

philidem commented 9 years ago

:+1: for supporting string templates with backticks

I think usage of backticks is a good idea and I think the Marko compiler could easily convert the string template to ES5 JavaScript.

Let's make this happen!

philidem commented 9 years ago

To @patrick-steele-idem's question, I would suggest only support variable replacement in string templates which are delimited with backticks. That is, don't do variable replacement within strings delimited with " and '.

Since migration tool seems necessary, I think we should convert any existing string attribute values to ES6 templates (using backticks).

SunnyGurnani commented 9 years ago

My Suggestion would be to allow all javascript expression in ${} only to avoid any confusion.

With proposed method it would be very confusing for anyone who is new to Marko to figure out which is plain text and which is javascript expression.

patrick-steele-idem commented 9 years ago

Hi @SunnyGurnani, what is being proposed is that the right-hand side is always a JavaScript expression. If it is valid JavaScript expression then it can go on the right hand side and it will actually be copied into the compiled JS verbatim. Even if right-hand value is a simple string it would still be a valid JavaScript expression. I think this will make things the least confusing.

The only special case is for attributes that are interpreted at compile-time (e.g. for="color in ['red', 'green', 'blue'])

Seem reasonable?

pswar commented 9 years ago

As i think more about it, i think what @SunnyGurnani proposed makes more sense and less confusing.

Few things to consider:

  1. 'hello ${name}' is not a valid JS expression
  2. If you want to actually have a string "[1, 2, 3]" as attribute value in mystring=[1, 2, 3] then it would have to be encoded which would make it look odd, otherwise it becomes array value.
patrick-steele-idem commented 9 years ago

Thanks for the comments @pswar. Some related thoughts:

1) I think we are saying the same thing. I'm in favor of only allowing ${...} in strings that use back ticks as "quotes" (not in strings quoted using double or single quotes). For example:

<my-component my-dynamic-string=`Hello ${data.name}!`/>

On top of that, we would transpile ES6 template strings to an ES5 equivalent.

2) If you want to have a string of "[1, 2, 3]" then you would just do: mystring="[1, 2, 3]". If you want an Array of [1, 2, 3] then you would just omit the quotes: myarray=[1, 2, 3]

pswar commented 9 years ago

:+1:

patrick-steele-idem commented 9 years ago

NOTE: I have updated the proposal to reflect that ${...} should only be allowed in ES6 template strings.

SunnyGurnani commented 9 years ago

Thanks @pswar and @patrick-steele-idem

JavaScript has given us the freedom from Classes and Types with current proposal we are taking away all that beauty and simplicity. I personally would prefer simplicity ${} for all expressions and let library handle the rest.

Current proposal suggest different implementation based on individual Types as following. However in JavaScript we never define type in the parameter or declare variable with type.

For dynamic string use ES6 strings component attribute=`hello ${name}'

Single function don't use anything component attribute=data.foo()

For Multiple Functions use () component attribute=(data.foo() + data.bar())

For Object use {} component attribute={hello: 'world'}

For Array use [] component attribute=[1,2,3]

@pswar I don't see any problem in following expression. Hello is static and ${name} is JavaScript expression. Please correct me if I am wrong? component attribute="hello ${name}"

DanCech commented 9 years ago

I was following up until the proposal that ${...} only be allowed within backtick-quoted attribute values, which would be a significant BC break. I understand the desire to simplify, but that seems a step too far to me vs retaining the existing behavior where quoted attribute values are scanned for ${...} replacements.

patrick-steele-idem commented 9 years ago

Hi @DanCech, this is still a developing proposal so nothing is set in stone. My thinking is that this is definitely going to be a breaking/major change and old templates will need to be migrated to work correctly with the new Marko version. We would want to make the migration as easy as possible so at a minimum we would need to provide a tool that would automatically update existing Marko templates to conform to the new parser by doing things such as the following:

Old:

<my-component foo="Hello ${data.name}"/>

New:

<my-component foo=`Hello ${data.name}`/>

As another example, if a custom "foo" attribute is declared as an "expression" type then the following conversion would happen:

Old:

<my-component foo="data.name"/>

New:

<my-component foo=data.name/>

Beyond that, if you are concerned that it would be hard to unlearn the old handling of ${...} in attribute values or that you still prefer all ${...} to be processed in all strings then that would be a good argument for going back to the old behavior and allowing ${...} in all quoted strings. Honestly, I could go either way since I see the value in both choices.

patrick-steele-idem commented 9 years ago

Hi @SunnyGurnani, I think there might be a misunderstanding because with the updated proposal we want to go closer to the simplicity of pure JavaScript. Instead of tag schemas, we are saying that an attribute value is simply a pure JavaScript expression and nothing more and nothing less. Marko has always been close to JavaScript by design but with the latest proposal we want to take it even closer. That is, Marko aims to be a perfect blend of JavaScript and HTML :)

patrick-steele-idem commented 9 years ago

Also, @SunnyGurnani, the parenthesis are only needed in situations where a "complex" expression has spaces that would have otherwise ended the attribute value.

DanCech commented 9 years ago

@patrick-steele-idem I understand where you're coming from, and in the end either way will work. I guess the way I was looking at the interpolation within quoted attributes question was that it's simple to think of replacement working the same way there as it does within CDATA sections, since they're both "text" parts of HTML.

One question this does bring up is how the different quoting styles in the template would affect the quoting in the generated HTML. For example, does this:

<my-component foo=`Hello ${data.name}`/>

compile to:

<my-component foo="Hello Dan" />

or:

<my-component foo='Hello Dan' />

or something else?

patrick-steele-idem commented 9 years ago

The idea is that the foo attribute in the following Marko template:

<my-component foo=`Hello ${data.name}`/>

Would compile to the following property definition in JavaScript code:

foo: "Hello " + data.name

NOTE: there would no signs of the backticks in the final compiled output because we want the code to work in ES5 and not just ES6

We could absolutely interpolate ${...} in normal quoted JavaScript strings, but then we are slightly changing the rules of JavaScript which may be surprising to some users if we say "the right-hand side of an attribute is just JavaScript".

SunnyGurnani commented 9 years ago

I got your point @patrick-steele-idem

I think then it would help to remove/depricate ${} from the language. is there a need to use ES6 backticks ? Cant we just use following ? my-component string-concatenated=("hello "+name)

This would avoid confusion and make Marko much easier to learn for people new to language.

patrick-steele-idem commented 9 years ago

Hi @SunnyGurnani, technically we don't need to support ${...} but just just makes the code easier to write and read compared to using string concatenation.

It did occur to me that we also currently support $<var-name> (without the curly braces) which is technically not part of the ES6 template strings spec. We could require ${<var-name>} but that seems like it would require extra code for end users unnecessarily. That is something we should also discuss. Thanks for the feedback so far!

DanCech commented 9 years ago

@patrick-steele-idem So are we considering this just got custom tags or more generally? My gut reaction would be that the syntax for custom tags should be as similar to "normal" tags as possible, which is where my (admittedly rushed) example came from, thinking about these template semantics in terms of defining attributes on regular tags.

bard commented 8 years ago

A big plus of marko for me is that it can be used with editors that understand HTML/XML (syntactically at least) instead of treating it like a big lump of characters, providing e.g. live linting and structure-aware navigation. In order to do so they need to parse it and that would likely be broken by adopting a custom syntax for attributes.

aaronshaf commented 8 years ago

Coming from JSX-land, I suggest:

<my-component foo={`Hello ${data.name}`} />

It would be great to have more overlap between the two standards. Many of us using marko on the back will be using Angular 2 or React/JSX on the front. I personally render React components and inject them into marko templates.

patrick-steele-idem commented 8 years ago

And then you have JavaScript template strings: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings

The problem with { ... } is that it appears all of the time in HTML markup. For example, inline style sheets:

<style>
.foo {font-weight: bold;}
</style>
patrick-steele-idem commented 8 years ago

Read your comment too fast, @aaronshaf. I thought you were suggesting { ... } over ${ ... } (which other templating languages choose... including Angular and Dust).

My problem with supporting the following:

<my-component foo={`Hello ${data.name}`} />

Is that I really feel like it would be cleaner if the right-hand side of an attribute is always treated as a JavaScript expression. The following code is a valid JavaScript expression:

`Hello ${data.name}`

The following is also a valid JavaScript expression despite the additional surrounding parenthesis added for grouping:

(`Hello ${data.name}`)`

But the following is not a valid JavaScript expression:

{`Hello ${data.name}`}

(curly braces are only valid for block statements)

That is, you can't do the following in JavaScript:

var foo = {`Hello ${data.name}`};

I would rather not introduce new symbols for the sake of introducing new symbols if that makes sense.

onemrkarthik commented 8 years ago

For conditional operators, how about something like this:

<div if(x > 5)>
   Do something
</div>

instead of

<div if=(x > 5)>
   Do something
</div>

For iterators,

<ul>
   <li for(color in colors)>
       ${color}
   </li>
</ul>

instead of

<ul>
    <li for="color in colors">
        ${color}
    </li>
</ul>
patrick-steele-idem commented 8 years ago

Great suggestion, @onemrkarthik. I like that the for(color in colors) and if(x > 5) directives match up very nicely to pure JavaScript. I'm going to update the original proposal to include your suggestions. Thanks!

onemrkarthik commented 8 years ago

@patrick-steele-idem I think the suggestion needs to be fixed to

<div if(x > 5)>
   Do something
</div>

I see an '=' after the if, in the suggestion.

patrick-steele-idem commented 8 years ago

Fixes. Thanks, @onemrkarthik

kristianmandrup commented 8 years ago

My gut reaction to this proposal was: "OH NO, NOT ANGULAR 2 SYNTAX!!!"

But reading a bit further on, I fully understand the reasoning. More power, less limits!!! However I don't think it should be "all or nothin", breaking existing templates etc. My proposal: Allow the HTML "flavor" described above, but also allow/enable it while keeping in full HTML attribute compatibility mode, ie x="y".

<div x=(5-2*sqr(2))/>

AND

<div x.e="(5-2*sqr(2))"/>

Where the .e means that the internals should be evaluated as pure javascript and returned as:

<div x=(5-2*sqrt(2))/>

Then evaluated as javascript to finally render <div x="12"/>

It think this combines the best of both worlds (and would allow me to keep using jade to marko compilation!). The same principle would apply for string interpolation etc., ie. simply having .e attribute name postfix means a first "compilation" phase, basically just stripping the "s and understanding the intention as javascript evaluation :)

patrick-steele-idem commented 8 years ago

Great feedback @kristianmandrup. What are your thoughts on the fragmentation that would result from supporting two different syntaxes?

philidem commented 8 years ago

FYI, I started working on a new parser that will better understand JavaScript expressions in HTML attributes (e.g. <custom message="Hello ${name}!" count=100 large=true complex=(a + b)>). The key difference between the existing HTML parsers is that attributes will not need to be surrounded by quotes (quotes should only be used if the attribute value will be of type string).

See https://github.com/philidem/htmljs-parser

The base HTML language is handled just fine but I am still working on:

There's also going to be additional work inside Marko to switch to new parser and there will be some breaking changes that need to be addressed with a template migration tool perhaps.

If anyone has some additional thoughts about the Marko syntax and improvements that could be made, then please let me know! Thoughts and feedback on the parser are also welcome.

patrick-steele-idem commented 8 years ago

Nice work, @philidem! Thank you for the hard work to make this happen. This will be a huge improvement to Marko and it will be nice to be able to do away the marko-tag.json files in most cases.

We'll definitely have to think about transforming old templates. The big thing is that if an attribute is declared as an expression then we will need to transform the template to drop the attribute quotes. For example:

Old:

<my-custom-tag data="data.someFunction()">

New:

<my-custom-tag data=(data.someFunction())>

We'll also need to transform if, for, else-if and else(?) attributes to use the new directive syntax.

I'm sure there will be some other transformations as well, but it is definitely doable.

Thanks again.

kristianmandrup commented 8 years ago

Looks very cool :) Good job!!!

kristianmandrup commented 8 years ago

Hi @philidem,

When can we expect to begin experimenting with this or contribute? How do we hook the parser into markojs to replace the default Parser? Please add usage/install instructions. Cheers :)

philidem commented 8 years ago

Still actively working on the parser and I'll push some more changes later today. There's still a lot of heavy work to do in Marko to leverage the parser and I think we're leaning toward making breaking changes while also providing a migration tool.

So nothing in the immediate future will be available but still working hard to get this done! I'll probably have to get feedback from @patrick-steele-idem when it is time make changes to Marko. Perhaps we can document the plan to see if we can help from the community as well.

patrick-steele-idem commented 8 years ago

We could definitely start on the migration tool right now so that it would be ready and well-tested in time for the switch to the new parser. That's an area where we could definitely use some more help from the community. I'll open another Github issue for that task.

patrick-steele-idem commented 8 years ago

Thoughts on the following?:

<var foo=1 hello='World'/>

Output JavaScript:

var foo=1;
var hello='World';

Old style:

<var name="foo" value="1"/>
<var name="hello" value="'World'"/>
philidem commented 8 years ago

Yes, like <var foo=1 hello='World'/>!

Also, I would like to propose this:

Declare function:

<function doSomething(a, b, c)>
</function>

Call function:

<doSomething(a, b, c) />
patrick-steele-idem commented 8 years ago

Also, we currently support simple conditionals in Marko:

<div class="{?data.isActive;active}">

How should we support that with the new parser?

This should of course work in the new parser, but it is a little uglier:

<div class=data.isActive?'active':''>
yomed commented 8 years ago

re: multiple variables, I don't know. I think most javascript styles these days prefer writing out one variable per line anyway.

patrick-steele-idem commented 8 years ago

@yomed one variable per line is a coding style. Doesn't really matter to the compiler and it could go either way.