RFC: Managing tags without extra markup

tizmagik commented 4 years ago

Problem

Currently, react-head utilizes data-rh attributes in order to manage the tags that it creates. By default, react-head avoids touching tags that it doesn't know about (that don't have data-rh attributes). This is so that static tags that are part of a static HTML template are not touched.

There's been a few issues (both in this project and react-helmet) from folks who are interested in: 1) Avoiding the extra markup, https://github.com/tizmagik/react-head/pull/84 https://github.com/tizmagik/react-head/issues/40 https://github.com/nfl/react-helmet/issues/79 2) Having react-head manage tags that it doesn't explicitly own (e.g. a default fallback title tag that's defined as part of a static HTML template), https://github.com/tizmagik/react-head/issues/83

While these two items are certainly separate, they are closely related and we might be able to kill two birds with one stone depending on our approach.

Potential Solutions

Option 1: Manually created whitelist definition

One solution, as proposed in #84 is to define an explicit whitelist prop:

/* on server */
const headTags = [];
<HeadProvider headTags={headTags} whitelist={['title', '[name="description"]', '[property^="og:"]'}>
  <App />
</HeadProvider>

/* on client */
<HeadProvider whitelist={['title', '[name="description"]', '[property^="og:"]'}>
  <div id="app">
    <Title>Title of page</Title>
    // ...
  </div>
</HeadProvider>

This solution has the following benefits:

1) Easily implementable 1) Completely opt-in

While this works, it has the following drawbacks:

1) Creating that whitelist prop isn't straight-forward (you have to be familiar with uncommon CSS selectors, [foo^=bar]) 1) You have to provide this same selector set to the server and the client (more manual overhead) 1) It doesn't explicitly solve for Problem 2 stated above (although it could depending on implementation).

Implementation Details

The implementation can be explored in the WIP PR by @CanRau

Option 2: Auto-Whitelist

Expanding on Option 1, we could auto-generate the necessary "selector set" as you go (opt-in to this behavior with autoWhitelist prop on HeadProvider):

/* server */
const headTags = [];
<HeadProvider headTags={headTags} autoWhitelist>
  <App>
    // ... within any component, head tags are rendered as normal
    <Meta name="blah-long-name" content="lots of content here" />
    <Title>Title of page</Title>
  </App>
</HeadProvider>

/* on client */
<HeadProvider>
  <App />
</HeadProvider>

Note that autoWhitelist prop isn't very expressive/meaningful without context. Maybe we can come up with a better name? raw? clean? Any other ideas?

Implementation Details

On the server, as we render head tags, if any of them include a whitelist prop, we build up the selector set necessary to identify which tags were inserted by react-head (basically the selector set that was manually created in Option 1 above). We return this as a renderable DOM element as part of the headTags[] array with some unique ID that we grab from the DOM and provide as context during client rendering:

const headTags = [
  // normal head tags
  '<meta name="blah" content="foo" />',
  '<title>Title of page</title>',

  // new: the generated whitelist selector set, to be injected during SSR
  // we'd basically JSON.parse() this content on the client and use
  // as part of the querySelectorAll: 
  `<script id="react-head-whitelist" type="text/template">
    [
      'meta[name="blah-long-name"][content="lots of content here"]',
      'title'
    ]
  </script>
  `
]

CONs: 1) Sort of bloats the server rendered payload (we'd basically duplicate all tags twice). There's some optimizations we could do here, e.g. select just the first few characters of each attribute instead of the full thing: meta[name^="blah"][content^="lots"]

PROs: 1) No additional setup required (as @CanRau suggested, the API doesn't need to change much!) 1) Completely opt-in so it's non-breaking (current behavior would be the default). Maybe we can flip this with a future major release. 1) Depending on how we generate the whitelist selector set, we could also solve for Problem 2 (e.g. in the above example the selector for the title tag is just title, instead of also including the full text content). Alternatively we could force opt-in to mangling by requiring an additional specific prop, <title mangle>Title of page</title> but not crazy about that idea.

Option 3: Individually opt-in to whitelist

As a tweak to Option 2, we could individually require opt-in, if we wanted to support whitelisting only individual elements, we could support an explicit whitelist prop on each head tag instead of an autoWhitelist prop on the HeadProvider.

  <title whitelist>Title of Page</title>

In this scenario you'd have a mix of data-rh and #react-head-whitelist selectors.

Not sure how popular this option would be and it complicates the implementation slightly so I'd almost rather have it be all or nothing. Thoughts?

PRO:

Opt-in
Selective SSR bloat vs all

CON:

Complicates implementation
Potentially confuses the mental model for users

Option 4: The Nuclear Option

The above assume that folks don't want react-head to manage tags that it hasn't explicitly rendered. There is a "nuclear" option of allowing folks to opt-in to having react-head mangle all tags:

/* server */
const headTags = [];
<HeadProvider headTags={headTags} manageAllTags>
  <App>
    // ... within any component, head tags are rendered as normal
    <Meta name="blah-long-name" content="lots of content here" />
    <Title>Title of page</Title>
  </App>
</HeadProvider>

/* on client */
<HeadProvider manageAllTags>
  <App />
</HeadProvider>

If manageAllTags is set, we can avoid rendering data-rh attributes and just have a very greedy selector on the client, something like: head > title, head > meta ... which would select all tags in <head /> whether or not they were created by react-head. This would work but would require that users express all their tags in the React tree and not just rely on static HTML templates. This is okay if we make this opt-in and explain thoroughly in the documentation.

PROs:

Simple to implement
Simple mental model
No SSR payload bloat

CONs:

All or nothing
May be surprising for folks who don't read the docs :)

tizmagik commented 4 years ago

I think I'm a fan of Option 2. What are your thoughts @CanRau @TrySound @rdsedmundo @TrejoCode @JoshuaToenyes ? Any other options you can think of? Would love your thoughts!

EDIT: I added an Option 4 above, the nuclear option 💣 😁

TrejoCode commented 4 years ago

For my use, option 2 favors me and in general, it seems like a simple solution to implement.

JoshuaToenyes commented 4 years ago

Very nicely described. Here are my thoughts on each option:

Option 1 - Manually created whitelist

Specifically on the "cons" list:

I don't think the whitelist syntax ([foo^=bar]) would be too complicated, and there are loads of resources that could be linked to for the unfamiliar.
A shared constant between client/server implementation could be used for the whitelist (we do this very frequently).
For solving Problem 2, seems like if I set whitelist={['title']} then any existing duplicate <title> tag would be clobbered on the client when rendered, right?

In general, I like the explicit approach Option 1 provides.

Option 2 - Auto-Whitelist

~I can see how this would be attractive... but honestly I much prefer the data-rh attributes over adding a whole new script tag with duplicate title tags. Even if the script tag was minimized using the first few characters, I still would prefer the data-rh approach.~

~I agree that data-rh isn't "pretty" and adds extra markup, but a script tag is much more "markup" than extra data attributes.~

~So this option isn't one I would prefer, and probably wouldn't use. If the switch was flipped to make this default behavior, I think a why to "opt-out" of this behavior would be great, especially for people who always do SSR.~

After being enlightened by @CanRau, (and reading the React Helmet issue linked above more carefully), I'm revising my opinion of option 2. This way ahead could actually be the best approach.

One possible solution to the overhead of including the tag's content would be to perform a very simple hash on the tag and/or attribute values. We use this algorithm for string hashing on both client and server (although we usually convert it to a string, like Math.abs(hash).toString(36)). It's super compact and fast.

// *server* would be a series of hashes like the below on each tag
hash('meta[name="blah-long-name"][content="lots of content here"]') => '[some short hash]'

// *client*
<script id="react-head-whitelist" type="text/template">
  [
    '[short hash]',
    '[short hash]',
    ...
  ]
</script>

On the client side, it would simply re-hash each tag checking for a match. If it matches, it's managed.

To make debugging easier, it would make sense to add an attribute or comment someplace indicating which tags are managed when NODE_ENV !== 'production'.

One other note... should the type attribute of that script tag be application/json instead of text/template?

Option 3 - Individually opt-in whitelist

I don't prefer this option since it requires "global" thinking at each component. If I'm writing a component that modifies the head tags, I don't really want to be thinking about how those head tags will be rendered.

In other words, I think the "whitelist" behavior should be abstracted away at the individual component level.

Option 4 - Nuclear!

I honestly like this approach:

Existing behavior is unaffected.
It removes the extra data-rh markup.
Doesn't require any additional page bloat (i.e. script tag).
Doesn't require any additional props at the component level.
It requires the least mental gymnastics. If you're using react-head to manage the head tags, and you want it to manage all the head tags, just flip the switch. It'll just work.

There's one case where I can imagine this wouldn't work well: If you're building some hybrid React application where React is only responsible for some portion of the page and not the entire page. For example, if I have a WordPress site and embed some React powered widget that uses react-head to manage head tags of the outer page. In this case, if you enable the nuclear option, it could potentially clobber the outer page's head tags. (But it's opt in... so... it's their own fault?)

Proposal: Option 5 - Nuclear + Whitelist

What if we had a hybrid Option 1 + Option 4 approach, with a manageAll and whitelist props? The manageAll prop would trigger the nuclear option, giving react-head permission to manage all head tags on the page. If the user needed more fine-grained control over what tags react-head is allowed to clobber, they could use the whitelist prop.

In this case, it should be an error to specify both the whitelist and manageAll props (mutually exclusive).

I like this approach because I think it provides a way to solve the problems presented and is incremental: existing behavior is preserved, users may allow react head to manage some or all of the head tags.

CanRau commented 4 years ago

I've to add that that CSS like whitelist won't work like that at the moment and I don't know a simple solution to get it working as mentioned here https://github.com/tizmagik/react-head/pull/84#issuecomment-537728374 and possible workarounds here https://github.com/tizmagik/react-head/pull/84#issuecomment-541687323

I'm actually open to any of those 😁I like the use case of a fallback a lot as well 👍

Prop. 5: Nuke + whitelist sounds good but might be overkill to add both I don't know

The manageAll (I'd be fine with that name) could give more control by "just" adding everything using react-head you'd like it to manage, removing the possibility of duplicates

@JoshuaToenyes just a side note, maybe you already know(?!), it's not only about beauty but fear of SEO issues due to some software not properly scraping meta tags with additional attributes, that's at least what many are talking about in the above linked react-helmet issue

JoshuaToenyes commented 4 years ago

@CanRau Ah, I wasn't aware of the SEO concern or scraping problems on the data attributes. I will read through that issue more carefully!

Our software powers quite a few sites (~100) most using react-head and a few running react-helmet. I haven't noticed an SEO impact using either... but I'm also not doing any detailed SEO analysis on head tags either... I would be very interested if there's any data on that.

CanRau commented 4 years ago

Me too actually, yet until that exists I think it’s desirable to take the safe route if possible :wink:

JoshuaToenyes commented 4 years ago

Thank you for pointing that out @CanRau. I revised my opinion on Option 2 above :)

CanRau commented 4 years ago

I know it's christmas just wanted to check for any updates on this?

tizmagik commented 4 years ago

Hey @CanRau thanks for following-up. Apologies but I haven't made progress on this, however, I think based on the discussion here we might at least have a path forward that I'd be open to a PR to add by someone, or I will try and get to myself if I get time.

Anyway, here's the key factors of what I think to be the best approach:

Easy mental model for users
Low overhead (both in terms of HTML payload and the developer experience)
Remove the need for data-rh attributes alltogether
Ability to manage all tags easily

I think a sort of super-hybrid of the Nuclear option and Auto-whitelist might work, sort of in line with the hybrid approach that @JoshuaToenyes suggested but combined to a single prop interface. Let me know what you think:

<HeadProvider headTags={headTags} manageTags={option}>

Where option is one of:

all: All tags are managed (we can make this the default in the next semver major)
types: Only the tags that are defined by react-head somewhere in the tree. For example, if you define a title tag via react-head, then all title tags are managed, while if you never specify any meta tags, then meta tags will remain untouched. This will avoid data-rh attributes.
specific: Individual and specific tags will be managed by react-head, this is equivalent to the current behavior. Requires using data-rh attributes.

What do you all think? Is this interface confusing? Does it cover all bases?

CanRau commented 4 years ago

Think that looks good 👍Not sure when I can find time to look into it though 😿 I'm working on a project again right now where I'm already using react-head with the Gatsby plugin, that might help get me working on this, no promises though 😅 At the moment I think I'd use types the most, as for the all I'd have to check if it'd mess with gatsby-plugin generated meta tags..

types may be called something like own or defined? Idk^^

tizmagik commented 4 years ago

types may be called something like own or defined? Idk^^

Yea, agreed, “types” is kind of a weird name. Open to other suggestions as well...

osdiab commented 4 years ago

Ran into this issue at my company - I'm glad to help but not sure what's the best avenue. For now planning to just wipe out the existing head tags on JS load to simulate the all managed option in our app, but not having the data attributes would be nice.

Thought for the "inferred"/"types" option (maybe infer can be the option name?) - is it planning to be at the granularity of "any meta tag" or something more specific like "meta tags with properties that were used"? If the latter, technically opengraph tags can be used as an "array" by having multiple of the same tag defined, which I don't personally use, but I guess wiping over existing stuff might do unexpected things for some users.

If a direction can be set then would be glad to try to contribute an implementation. Thanks!

tizmagik commented 4 years ago

Thanks for your interest, @osdiab ! Please feel free to submit a PR on this if you're up for it.

Thought for the "inferred"/"types" option (maybe infer can be the option name?)

I'm fine with infer as the option name, that works too. So it would be one of: all, infer or specific. Sounds good 👍

is it planning to be at the granularity of "any meta tag" or something more specific like "meta tags with properties that were used"?

infer would opt react-head to manager all tags that were defined somewhere in the tree. So if your tree looked something like this:

<HeadProvider headTags={headTags} manageTags="infer">
  <App>
    // ... within any component, head tags are rendered as normal
    <Meta name="blah-long-name" content="lots of content here" />
    <Title>Title of page</Title>
  </App>
</HeadProvider>

And your HTML template looked something like this:

<head>
   <title>Fallback title</title>
   <link rel="stylesheet" type="text/css" href="theme.css">
</head>

Then the resultant tags on the page, after client rehydration would look like this:

<head>
   <meta name="blah-long-name" content="lots of content here" />
   <title>Title of page</title>
   <link rel="stylesheet" type="text/css" href="theme.css">
</head>

Note that meta and title tags, since they were defined in the JS are "opted-in" to be managed by react-head and any pre-existing tags in the HTML template (e.g. title) were removed. Meanwhile, since no <Link /> react-head tags were defined, the link tags were unaffected.

Let me know if that's still unclear. Thanks!

duythinh-dev commented 2 years ago

I need to help :(
Im making website https://www.hahalolo.com/tour I setting all meta tags but when i check in developers.facebook.com/tools/debug
i check on fb it returns me results of website hahalolo.com

tizmagik / react-head