Reconcile difference between dynamic tags, literal tags, and manual creation

yewstack / yew

Rust / Wasm framework for creating reliable and efficient web applications

https://yew.rs

Apache License 2.0

30.7k stars 1.42k forks source link

Reconcile difference between dynamic tags, literal tags, and manual creation #1269

Open siku2 opened 4 years ago

siku2 commented 4 years ago

Problem

With the introduction of dynamic tags in #1266 there are now three different ways of constructing HTML tags:

Using literal tags in the html! macro (<span></span>)
Dynamic tags in the macro (<@{"span"}></@>)
Creating VTag instances manually (VTag::new("span"))

All three of these have different restrictions with the third option (manually constructing) having virtually none at all. Dynamic tags try to have the same restrictions as literal ones using runtime checks but there are still a few differences:

dynamic tags are always converted to lower case. While <sPAN></sPAN> and <@{"sPAN"}></@> produce equivalent HTML, Yew currently stores the former as "sPAN" internally and the latter as "span".
literal tags must be valid rust identifiers with the exception that they may contain hyphens (-) anywhere in the middle. Dynamic tags can use any ASCII character at any position.
void elements must be self closing using literal tags <br/> but dynamic tags can do <@{"br"}></@>. I think this one is actually a good thing because it gives dynamic tags more flexibility without any downside.

There's another issue that currently plagues both macro methods. They only allow tag names containing characters in the ASCII range. This is too broad for normal tag names which only allow for alphanumerics^[1] but it's also too narrow for custom elements which support a big range of characters outside of the ASCII range^[2].

Somewhat related to this issue is that Yew currently preserves the casing for tag names. This can easily lead to bugs like this:

let vtag = VTag::new("dIV");
if vtag.tag == "div" {
  println!("it's a div!");
}

Proposal

All three methods should have exactly the same rules. For the literal case these rules should be enforced during compilation while the other two need runtime checks.

Valid Tag Names

The VTag constructor ensures the given tag name is valid. It might be a good idea to use a wrapper type TagName internally. This type could be treated just like a normal string but it can only contain valid tag names. It would also make it easier to ignore casing when comparing names.

Void Element

Void elements must not have any children. To guard against this we store a allow_children flag in VTag which is set in the constructor by comparing the tag name against the list of void elements. The add_child method is modified to check this flag first.

Special Attributes

The attribute value needs to be handled differently depending on the tag name. This can be done by adding a check to the set_value method which forwards the value to the attributes instead if the tag isn't input or textarea.

In all these cases the literal tags in macros should already perform the checks during compilation. To avoid the performance penalty there should be "private" methods which avoid the checks entirely which can then be used by the macro.

There should also be equivalent methods that return a Result instead of panicking. This gives us a structure like this:

// used by normal users
fn do() -> T {
  try_do().expect("appropriate message")
}

// can be used by libraries and cases where an error is expected
fn try_do() -> Result<T, E> {
  // perform the various checks

  Ok(unchecked_do())
}

// used by macro
fn unchecked_do() -> T {
  // do the thing
}

philip-peterson commented 4 years ago

Per the discussion we had on #1344 , I would question why we need dynamic strings as tag values at all. For HTML, MathML, and SVG elements, these should be easy to represent as some built-in Yew enum which could be represented as a &'static str in code but converted to an enum value through macros.

In the case of custom elements, maybe that could be declared via some configuration, possibly in Cargo.toml or just in Yew config, stored as a type inside App or in some other Yew config location. That way we can avoid checking every tag name is valid at runtime as well as needing to bundle a list of valid tags in the main bundle.

philip-peterson commented 4 years ago

~~That said, I could see a use case for this type of pattern as well which could be good to keep in mind when we are implementing a solution for this Issue:~~

let z = if some_condition {
  Box
} else {
  Circle
};

return <@{z}></@>

philip-peterson commented 4 years ago

After doing some more research, a couple of modifications to my stance:

The branching on types I suggested in the above comment seems unwise / out of scope, since it would be moving type resolution to runtime, escaping Rust's type safety. A generic could achieve the same result.
Sorry, I didn't realize dynamic tags are basically brand-new. I can understand the want for them if regular HTML-style tag instantiation isn't working for custom tags with non-ASCII names, although still not sure about the differences long-term between those two styles. Was access to custom elements the main motivation for creating them?
Although WhatWG specifies that custom tags should be lowercase, browsers still let you reference them with uppercase versions. However, not all upper/lower case variants are defined the same. For instance, Chrome lets you define a custom tag called image-a and then when you instantiate it as <image-A />, it works. However, if you define it as image-ê and reference it as image-Ê, it doesn't work. So we must be mindful of the different notions Unicode has of capitalization and make sure we don't stray beyond actual browser behavior. Firefox, interestingly, does not however allow a tag named image-ê to be defined even.

One more observation- although SVG has some elements that are defined as having capital letters in them such as <linearGradient />, the browsers I tested (Chrome, Firefox, Safari) all respond to <lineargradient /> as well.

siku2 commented 4 years ago

Using an enum which contains all possible tag names isn't a terrible idea. We can use it in conjunction with a tag_name! macro that returns the appropriate variant from a string. Custom element names could be stored in a special variant:

enum TagName {
  Div,
  // ...
  Custom(String),
}

Using this dynamic tag names would no longer accept a String but instead a variant of TagName.

That way we can avoid checking every tag name is valid at runtime as well as needing to bundle a list of valid tags in the main bundle.

In most cases this doesn't apply anyway. The most common case is creating a tag name through the macro where the names are verified at compile-time. The only case where we need to check tag names at runtime is for dynamic tag names (be it through the macro or using VTag::new). As long as we have the ability to create dynamic tags there's no way to avoid a runtime check. Even if we have an enum we would have to implement From<String> on it.

Was access to custom elements the main motivation for creating them?

No, it sure is a nice side-effect but the primary reason for them is to create dynamic tag names. There are cases where using a match statement would be needlessly verbose. It also works for higher order components that allow you to pass the tag type.

To summarise: I'm not against the idea of using an enum for tag names. In fact I've thought about it a lot myself. However, I think we need to make a strong case as to why it's beneficial in the long run. I feel like this is something @jstarry would need to weigh in on.

Although WhatWG specifies that custom tags should be lowercase, browsers still let you reference them with uppercase versions.

That's exactly what the standard dictates:

In the HTML syntax, tag names, even those for foreign elements, may be written with any mix of lower- and uppercase letters that, when converted to all-lowercase, matches the element's tag name; tag names are case-insensitive.

if you define it as image-ê and reference it as image-Ê, it doesn't work.

Custom element names aren't case-insensitive in the same way. They're not allowed to use ASCII upper alpha but they any other character is case-sensitive.

One more observation- although SVG has some elements that are defined as having capital letters in them

Although tag names are case-insensitive the standard sometimes defines a canonical representation. This isn't really relevant to us because the representation doesn't matter for the lookup.

philip-peterson commented 4 years ago

Been thinking about this some more...

All three methods should have exactly the same rules

I'm not sure whether this is possible. It seems like macros are not able to parse arbitrary UTF8 sequences from files... which some custom tag names have (within restriction). That said, the capitalization issues are probably at least something we can fix.

Dynamic tags can use any ASCII character at any position.

This isn't quite true, dynamic tags have to begin with a non-hyphen. There is some question though about whether we could expand the definition of a dashed name to be more inclusive, although I suspect it would only be by a little, such as having a hyphen at the end perhaps.

We can use it in conjunction with a tag_name! macro that returns the appropriate variant from a string. Custom element names could be stored in a special variant

I assume by this it was meant that in the normal case, one would not have to use this macro?

I suppose we could have the HTML tag macro accept any of div, span, table, etc. which are actually Rust symbols imported from a prelude somewhere, corresponding to say yew::tags::{div, span, table}. And then, if someone wanted to specify a custom tag, they could import yew::tags::{Custom} and do

<Custom("my-custom-tag")>{ "foo" }</Custom("my-custom-tag")>

Assuming this works, would the tag_name! macro still be needed?

siku2 commented 4 years ago

@philip-peterson Just a quick clarification:

Dynamic tags can use any ASCII character at any position.

This isn't quite true, dynamic tags have to begin with a non-hyphen. There is some question though about whether we could expand the definition of a dashed name to be more inclusive, although I suspect it would only be by a little, such as having a hyphen at the end perhaps.

I was talking about how it's currently implemented, not how it should be. You can see the runtime check that is inserted here: https://github.com/yewstack/yew/blob/151b5b7db954e6d3f5f6ad5be0c0991131adde26/yew-macro/src/html_tree/html_tag/mod.rs#L110-L124

All three methods should have exactly the same rules

I'm not sure whether this is possible. It seems like macros are not able to parse arbitrary UTF8 sequences from files... which some custom tag names have (within restriction). That said, the capitalization issues are probably at least something we can fix.

You're right, I mentioned that literal tags need to consist of valid rust identifiers. If you try to use invalid characters then compilation fails even before the macro comes into play so there's really nothing we can do about this. In these rare cases dynamic tag names should come in handy.

We can use it in conjunction with a tag_name! macro that returns the appropriate variant from a string. Custom element names could be stored in a special variant

I assume by this it was meant that in the normal case, one would not have to use this macro?

I suppose we could have the HTML tag macro accept any of div, span, table, etc. which are actually Rust symbols imported from a prelude somewhere, corresponding to say yew::tags::{div, span, table}. And then, if someone wanted to specify a custom tag, they could import yew::tags::{Custom} and do

<Custom("my-custom-tag")>{ "foo" }</Custom("my-custom-tag")>

Assuming this works, would the tag_name! macro still be needed?

The tag_name! macro I have in mind actually works to enhance the setup you mentioned. Let's look at the syn crate. Syn has a Token! macro which is very similar to what I have in mind. There is a module syn::token that contains various Rust tokens and keyword (very similar to your tags module). Instead of accessing the members of said module directly syn recommends you use the Token macro. Token![static] then resolves to syn::token::Static.

Here's a few benefits of doing it this way:

No need to bring the tags into scope or memorise the struct name. We can just use the tags exactly like we would write them in HTML.
We can easily detect custom element names and verify their validity at compile time (Of course there's still the problem with unicode characters but this is easily solved by allowing literal strings in the macro).
Using the macro we can provide more detailed feedback to the developer.
We can integrate this macro directly into the html! macro. html! { <div/> } -> VTag::new(tag_name!(div)) -> VTag::new(yew::tags::Div).

teymour-aldridge commented 4 years ago

I personally think that dynamic tags aren’t necessary and are just unnecessarily increase binary sizes.

IMO they’re a “solution in search of a problem.”

They make Yew unsafe because the statics of a program don’t line up with the dynamics (by their nature) which makes it easy for mistakes to creep in (this isn’t really idiomatic Rust which I think is fundamentally about being safe – safe in the sense of having aligning statics and dynamics).

It would be nice to see some examples of why dynamic tags are needed and use cases in which a problem can be better solved by using a dynamically computed tag name than using the features Rust already provides.

Please let me know if I’ve misunderstood anything.

On 7 Jul 2020, at 12:31, Simon notifications@github.com wrote:

@philip-peterson https://github.com/philip-peterson Just a quick clarification:

Dynamic tags can use any ASCII character at any position.

This isn't quite true, dynamic tags have to begin with a non-hyphen. There is some question though about whether we could expand the definition of a dashed name to be more inclusive, although I suspect it would only be by a little, such as having a hyphen at the end perhaps.

I was talking about how it's currently implemented, not how it should be. You can see the runtime check that is inserted here: https://github.com/yewstack/yew/blob/151b5b7db954e6d3f5f6ad5be0c0991131adde26/yew-macro/src/html_tree/html_tag/mod.rs#L110-L124 https://github.com/yewstack/yew/blob/151b5b7db954e6d3f5f6ad5be0c0991131adde26/yew-macro/src/html_tree/html_tag/mod.rs#L110-L124 All three methods should have exactly the same rules

I'm not sure whether this is possible. It seems like macros are not able to parse arbitrary UTF8 sequences from files... which some custom tag names have (within restriction). That said, the capitalization issues are probably at least something we can fix.

You're right, I mentioned that literal tags need to consist of valid rust identifiers. If you try to use invalid characters then compilation fails even before the macro comes into play so there's really nothing we can do about this. In these rare cases dynamic tag names should come in handy.

We can use it in conjunction with a tag_name! macro that returns the appropriate variant from a string. Custom element names could be stored in a special variant

I assume by this it was meant that in the normal case, one would not have to use this macro?

I suppose we could have the HTML tag macro accept any of div, span, table, etc. which are actually Rust symbols imported from a prelude somewhere, corresponding to say yew::tags::{div, span, table}. And then, if someone wanted to specify a custom tag, they could import yew::tags::{Custom} and do

<Custom("my-custom-tag")>{ "foo" }</Custom("my-custom-tag")>

Assuming this works, would the tag_name! macro still be needed?

The tag_name! macro I have in mind actually works to enhance the setup you mentioned. Let's look at the syn https://docs.rs/syn/1.0.33/syn/ crate. Syn has a Token! https://docs.rs/syn/1.0.33/syn/macro.Token.html macro which is very similar to what I have in mind. There is a module syn::token https://docs.rs/syn/1.0.33/syn/token that contains various Rust tokens and keyword (very similar to your tags module). Instead of accessing the members of said module directly syn recommends you use the Token macro. Token![static] then resolves to syn::token::Static https://docs.rs/syn/1.0.33/syn/token/struct.Static.html.

Here's a few benefits of doing it this way:

No need to bring the tags into scope or memorise the struct name. We can just use the tags exactly like we would write them in HTML. We can easily detect custom element names and verify their validity at compile time (Of course there's still the problem with unicode characters but this is easily solved by allowing literal strings in the macro). Using the macro we can provide more detailed feedback to the developer. We can integrate this macro directly into the html! macro. html! {
} -> VTag::new(tag_name!(div)) -> VTag::new(yew::tags::Div). — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yewstack/yew/issues/1269#issuecomment-654791395, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKFSTPMZELK5ZF6NKG7PG63R2MBQFANCNFSM4NNFF4SQ.

siku2 commented 4 years ago

@teymour-aldridge Just to clarify: I don't particularly care whether Yew supports dynamic tag names or not - I'm playing devil's advocate here. This discussion also isn't only about dynamic tag names. The same thing is possible with VTag::new but with even less restrictions.

Here's an example of something that dynamic tags make possible: Let's say we have a typography component for headings which accepts the level (h1 - h6) as a prop.

This is what it looks like using dynamic tag names:

// ensure that 1 <= level <= 6
let h_tag = format!("h{}", level);
<@{h_tag}>{ contents }</@>

And here's the same without:

match level {
  1 => html! { <h1>{ contents }</h1> },
  2 => html! { <h2>{ contents }</h2> },
  3 => html! { <h3>{ contents }</h3> },
  4 => html! { <h4>{ contents }</h4> },
  5 => html! { <h5>{ contents }</h5> },
  6 => html! { <h6>{ contents }</h6> },
  _ => panic!("invalid header level"),
}

This isn't terribly bad but I'm doing it a lot of favours such as not using any attributes. Just imagine what it would look like if we wanted to add an onclick listener to this. Dynamic tags are only really useful for higher order components but it's important that we support this because it makes it a lot easier to create component libraries.

Let's try to abstract this a bit more. The underlying problem here is that we're unable to propagate tag names. There's no way for our hypothetical component to accept h1 - h6 as a value without using an opaque representation like a string or a number representing the level. In the html macro tag names work like types but outside of it these types don't exist; they're just strings. Dynamic tag names are a patch solution for the inability to pass tag names around transparently.

All of this could be solved if we adopted a solution where each tag name is assigned a type. It would give us the ability to transparently pass tag names to components with type checking. Here's what the same component might look like with statically typed tag names:

use yew::tags::{H1, H2, H3, H4, H5, H6};

// this is just a marker trait to ensure that `Heading` is only used with heading tags.
trait HTag {}
impl HTag for H1 {}
impl HTag for H2 {}
impl HTag for H3 {}
impl HTag for H4 {}
impl HTag for H5 {}
impl HTag for H6 {}

struct Props<T> {
    tag: T,
    text: String,
}
struct Heading<H: HTag> {
    props: Props<H>,
}
impl<H: HTag> Component for Heading<H> {
    type Props = Props<H>;

    fn view(&self) -> Html {
        let Props { tag, text } = &self.props;
        html! {
            <@{tag}>{ text }</@>
        }
    }
}

Note that in this example the syntax for dynamic tag names no longer accepts strings, only members of the yew::tags module. How this is to be achieved is still up for discussion but either way, this accomplishes type safety and doesn't depend on any runtime checks.

teymour-aldridge commented 4 years ago

Thanks for this explanation – it’s really helpful!

I also now see why dynamic tags can be useful.

On 7 Jul 2020, at 13:31, Simon notifications@github.com wrote:

@teymour-aldridge https://github.com/teymour-aldridge Just to clarify: I don't particularly care whether Yew supports dynamic tag names or not - I'm playing the devil's advocate here. This discussion also isn't only about dynamic tag names. The same thing is possible with VTag::new but with even less restrictions.

Here's an example of something that dynamic tags make possible: Let's say we have a typography component for headings which accepts the level (h1 - h6) as a prop.

This is what it looks like using dynamic tag names:

// ensure that 1 <= level <= 6 let h_tag = format!("h{}", level); <@{h_tag}>{ contents }</@> And here's the same without:

match level { 1 => html! {
{ contents }
}, 2 => html! {
{ contents }
}, 3 => html! {
{ contents }
}, 4 => html! {
{ contents }
}, 5 => html! {
{ contents }
}, 6 => html! {
{ contents }
}, _ => panic!("invalid header level"), } This isn't terribly bad but I'm doing it a lot of favours such as not using any attributes. Just imagine what it would look like if we wanted to add an onclick listener to this. Dynamic tags are only really useful for higher order components. It's important that we support this because it makes it a lot easier to create component libraries.

Let's try to abstract this problem a bit more. The underlying problem here is that we're unable to propagate tag names. There's no way for our hypothetical component to accept h1 - h6 as a value without going using an opaque representation like a string or a number representing the level. In the macro tag names work like types but outside of it they don't exist; they're just strings. In a way, dynamic tag names are a patch solution for the inability to pass tag names around transparently.

All of this could be solved if we adopted a solution where each tag name is assigned a type. It would give us the ability to transparently pass tag names to components with type checking. Here's what the same component might look like with statically typed tag names:

use yew::tags::{H1, H2, H3, H4, H5, H6};

// this is just a marker trait to ensure that Heading is only used with heading tags. trait HTag {} impl HTag for H1 {} impl HTag for H2 {} impl HTag for H3 {} impl HTag for H4 {} impl HTag for H5 {} impl HTag for H6 {}

struct Props { tag: T, text: String, } struct Heading { props: Props, } impl Component for Heading { type Props = Props;
fn view(&self) -> Html {
    let Props { tag, text } = &self.props;
    html! {
        <@{tag}>{ text }</@>
    }
}
} Note that in this example the syntax for dynamic tag names no longer accepts strings, only members from the yew::tags. How this is to be achieved is still up for discussion but either way, this accomplishes type safety and doesn't depend on any runtime checks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yewstack/yew/issues/1269#issuecomment-654823209, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKFSTPKPGEO4JDMAANZ2MBDR2MIRRANCNFSM4NNFF4SQ.

philip-peterson commented 4 years ago

I agree with what you say @siku2 , having the macro would be useful for construction using VTag::new. If we do have a tags module though it would be best IMO to have the tag names lowercased, because then when you see an title cased symbol you’ll know it’s a type / component.

For the example provided with the dynamic tag name, would that value passed into @{} not be a string currently? If in the new world, literal tag names are types, you could just construct a tag using the literal syntax but also the generic param, so it could be like <H>{ “foo” }</H> and you wouldn’t need a dynamic tag name or the prop either. Dynamic tag construction could still be useful for H{1,2,...,6} construction though, as you say, or using a prop to determine what kind of element will be the root of a component’s return value.

DasBlackfur commented 2 years ago

One more observation- although SVG has some elements that are defined as having capital letters in them such as , the browsers I tested (Chrome, Firefox, Safari) all respond to as well.

This might be a recent regression in browsers but it seems that this no longer holds true.

WorldSEnder commented 2 years ago

I have a somewhat different view on treating browser-behavior: We shouldn't try to get in the way, except for debug_assertions and compile time checks. When running in release mode, all bets can be off. At this point, we shouldn't insert a check if a dynamic void element contains children. Let the browser handle that. A custom defined element that contains illegal characters? Also let the browser handle this.

I also don't think a enumeration of "allowable" tag names will do much good, if it's for more than debug-assertions and linting. It will eventually get out of date, besides the number of svg elements already being rather large. This list doesn't have to leak into the final binary.

For capitalization, I'm not even sure about the rules the browsers currently implement. For example, an inline <svg> without a xml namespace declaration currently seems to follow normal HTML rules for contained elements, in Firefox. I don't think we can or should reproduce this rule set in yew. Just try our best to warn the programmer, follow an internal standard (i.e. all tag names are lowercase when coming from the literal macro) and reconcile in a straight-foward way, without attempting to normalize input we have no control over - at least in production mode.

Now, the special casing for "input" and "textarea" et al should stay and work regardless of casing.