vercel / next.js

The React Framework
https://nextjs.org
MIT License
127.09k stars 27.01k forks source link

URL in Metadata will encodeURLComponent unexpectedly. Url query `&` will encode into `&` automatically #52598

Open Innei opened 1 year ago

Innei commented 1 year ago

Verify canary release

Provide environment information

Operating System:
      Platform: darwin
      Arch: arm64
      Version: Darwin Kernel Version 22.5.0: Thu Jun  8 22:21:34 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T8112
    Binaries:
      Node: 18.16.1
      npm: 8.19.4
      Yarn: 1.22.19
      pnpm: 8.6.2
    Relevant Packages:
      next: 13.4.9
      eslint-config-next: N/A
      react: 18.2.0
      react-dom: 18.2.0
      typescript: 5.1.6
    Next.js Config:
      output: standalone

Which area(s) of Next.js are affected? (leave empty if unsure)

App Router, Metadata (metadata, generateMetadata, next/head)

Link to the code that reproduces this issue or a replay of the bug

https://github.com/Innei/nextjs-metadata-url-encode-reproduction

To Reproduce

  1. next dev
  2. Open the home page's source view.

Describe the Bug

Next.js metadata url will encodeURLComponent unexpectedly.


export const generateMetadata = async (): Promise<Metadata> => {
  const ogUrl = new URL('http://localhost:3000/og.png')

  ogUrl.searchParams.set('title', 'Nextjs')
  ogUrl.searchParams.set('description', 'description - Nextjs')

  return {
    title: 'Nextjs',
    openGraph: {
      images: ogUrl,
    },
  }
}

and generate the code of <meta> tag.

<meta property="og:image" content="http://localhost:3000/og.png?title=Nextjs&amp;description=description+-+Nextjs"/>

The & will encode to &amp; unexpected.

Expected Behavior

No need to automatically encode HTML characters.

Which browser are you using? (if relevant)

No response

How are you deploying your application? (if relevant)

No response

NEXT-2729

joaolandino commented 1 year ago

Same here 🚨

patricktansg commented 1 year ago

any updates for this issues? Meanwhile, when will we able to inject inline script into head when using App Directory?

punarinta commented 1 year ago

Does anyone know if there's a workaround for this problem?

woodbridge commented 1 year ago

Can confirm that this issue still occurs

dianaKovalBetterme commented 8 months ago

are there any updates?

Edit by maintainer bot: Comment was automatically minimized because it was considered unhelpful. (If you think this was by mistake, let us know). Please only comment if it adds context to the issue. If you want to express that you have the same problem, use the upvote 👍 on the issue description or subscribe to the issue for updates. Thanks!

gnoff commented 8 months ago

@Innei Can you describe what you were doing that this escaping is a problem? Is there some metadata parser that cannot handle HTML escapes?

If you visit your example page you'll see that the browser correctly unescapes the sequence and you can treat the string as if it has an ampersand just fine.

By the way this is not something that Next.js is doing this is a behavior that React does and has done for a very long time. If React did not escape attributes when SSR'ing it would be very easy for anyone rendering user provided strings to be exposed to HTML injection attacks.

Have you (or anyone else in this thread) run into a practical issue where this escaping is breaking some tool?

hcorreia commented 8 months ago

For me the problem is mainly with Facebook openGraph images, but I'm pretty sure other fields have the same problem. Facebook sees &amp; instead of & and cannot fetch the image properly.

I had to create a endpoint that decodes a query param and does a redirect to the correct img url to workaround this issue. I can explain better if others need this, I don't remember where I found the idea. But it's a really ugly hack.

gnoff commented 8 months ago

If you are willing would you mind pointing me to a page hosting an OG image that fails b/c of this escape including which platform fails to fetch the image?

I'm curious because React has SSR'd these content attributes in meta tags exactly this way for years and I can't imagine OG images fail to be fetchable across every SSR'd react project. I wonder if the problem is actually more narrowly triggered either by specific tools or by something else happening in Next.js like double-escaping

mpereira commented 8 months ago

Hi @gnoff. I created a minimal reproduction of this issue here: https://github.com/mpereira/react-ssr-meta-tags-escape-bug

I'll echo below the details written in the README.

Since Next.js 13 it is possible to add Open Graph metadata via the generateMetadata function in a page.tsx file. In this project, app/page.tsx looks like the following:

const BASE_URL = "https://webhook.site/a4b648dd-64ad-481b-ac6e-3893ca157b1b";

export async function generateMetadata(_props) {
  const ogImage = `${BASE_URL}?foo=1&bar=2`;

  return {
    openGraph: {
      images: [
        {
          url: ogImage,
        },
      ],
    },
    twitter: {
      images: [ogImage],
    },
  };
}

export default function Home() {
  return (
    <main>
      <h1>react-ssr-meta-tags-escape-bug</h1>
    </main>
  );
}

Starting a development server with npm run dev and accessing view-source:http://localhost:3000 will show the following Open Graph tags:

<meta
  property="og:image"
  content="https://webhook.site/a4b648dd-64ad-481b-ac6e-3893ca157b1b?foo=1&amp;bar=2"
/>
<meta
  name="twitter:image"
  content="https://webhook.site/a4b648dd-64ad-481b-ac6e-3893ca157b1b?foo=1&amp;bar=2"
/>

The query string separator (?) is correctly not escaped in the resulting string, while the query string parameter separator (&) is incorrectly escaped to &amp;.

This results in a URL that won't behave as expected. One example is a dynamic Open Graph image that depends on URL query parameters for its appearance.

mpereira commented 8 months ago

Deployed to Vercel: view-source:https://react-ssr-meta-tags-escape-bug.vercel.app

Screenshot 2024-03-06 at 23 00 23

Innei commented 8 months ago

@Innei Can you describe what you were doing that this escaping is a problem? Is there some metadata parser that cannot handle HTML escapes?

If you visit your example page you'll see that the browser correctly unescapes the sequence and you can treat the string as if it has an ampersand just fine.

By the way this is not something that Next.js is doing this is a behavior that React does and has done for a very long time. If React did not escape attributes when SSR'ing it would be very easy for anyone rendering user provided strings to be exposed to HTML injection attacks.

Have you (or anyone else in this thread) run into a practical issue where this escaping is breaking some tool?

As @mpereira said above, we need to keep the & character in the query because when generating the OG we need to determine what type of image to use based on the query's parameters. Obviously, we need to provide a way to not escape characters because here we are ensuring that the information in the query is safe from the risk of XSS.

damian-balas commented 8 months ago

Same here :(

"next": "14.1.2",

hcorreia commented 8 months ago

@gnoff I created a example site that mimics what our production site did before the workaround.

Its basically what @mpereira already demonstrated.

Website: https://next-metadata-url-issue.vercel.app/ Code: https://github.com/hcorreia/next-metadata-url-issue/blob/main/src/app/page.tsx

Just a note, I think Facebook changed something in their scraper because now it's recognizing the image even with the &amps; but I can't prove it.

bashidagha commented 5 months ago

this problem still persist

Edit by maintainer bot: Comment was automatically minimized because it was considered unhelpful. (If you think this was by mistake, let us know). Please only comment if it adds context to the issue. If you want to express that you have the same problem, use the upvote 👍 on the issue description or subscribe to the issue for updates. Thanks!

guy032 commented 5 months ago

Any workaround for this?

jyxjjj commented 5 months ago

@gnoff
It is easy to reproduce: <meta name="description" content="i am a 'description' content."> Then you will view: i am a &#x27;description&#x27; content.

jyxjjj commented 5 months ago

Both in https://github.com/facebook/react/issues/13838 especially https://github.com/facebook/react/issues/13838#issuecomment-1241593329