vercel / next.js

The React Framework
https://nextjs.org
MIT License
125.13k stars 26.73k forks source link

Pages with utf-8 name don't work properly under SSR #10084

Open frei-0xff opened 4 years ago

frei-0xff commented 4 years ago

Bug report

Pages with utf-8 non-ASCII characters in their name don't work properly under SSR

Describe the bug

Pages with utf-8 non-ASCII characters in their name work just fine with client-side navigation, but when rendered on server side return "404 This page could not be found."

To Reproduce

Steps to reproduce the behavior, please provide code snippets or a repository:

  1. Create page 'pages/тест.js'
  2. Navigate to http://localhost:3000/тест
  3. See error "404 This page could not be found."

Expected behavior

I'm expecting to see page 'pages/тест.js' rendered

System information

Additional context

Minimal repository to reproduce bug: https://github.com/frei-0xff/nextjs-utf8-pagename

StarpTech commented 4 years ago

What's the purpose of using none-ASCII chars if your page name should be displayed as a valid URL?

http://localhost:3000/тест is converted to http://localhost:3000/%D1%82%D0%B5%D1%81%D1%82 and can't be found.

kachkaev commented 4 years ago

@StarpTech you might want to have a link like http://яндекс.рф/тест. These display as Cyrillic URLs modern browser tabs. From my experience, тест turns into %D1%82%D0%B5%D1%81%D1%82 only when you copy the URL into buffer.

frei-0xff commented 4 years ago

@StarpTech none-ASCII URL's displayed properly in all modern browsers and used by popular sites. For example by wikipedia.org

StarpTech commented 4 years ago

Thanks for the examples. I have never used it.

Itzik7 commented 4 years ago

As a workaround you can use dynamic page [page and switch case on pages names in utf8 pages name.

frei-0xff commented 4 years ago

In version 9.2 client-side routing for pages with non-ASCII characters worked just fine. The issue was only with the server-side routing, that could be worked around with custom server.js with decodeURI(parsedUrl.pathname).

After updating to version 9.5.1 client-side routing for pages with non-ASCII characters stopped working at all. In development mode, after clicking on the link with such a page name, no navigation happens without any error messages. After routeChangeStart event neither routeChangeComplete nor routeChangeError events are fired, and only after clicking on another link routeChangeError with "Error: Route Cancelled" is fired.

Edit: It seems that this https://github.com/vercel/next.js/pull/14827 was the breaking change. Because URLs returned by WHATWG URL API are URL-encoded and it is inconsistent with other parts of the code.

jonrh commented 3 years ago

Tested v9.5.0, v9.5.5, and v10.0.1 and none of them support statically generated pages with non-ascii names like /тест and /hæ. It worked as expected in a few versions I tested between v9.0.0 and v9.4.4. I would classify this is a bug or an undocumented breaking change in v9.5.

fillon commented 3 years ago

I am experiencing the same issue with route in thai language

Is there a workaround?

next-10.0.3

jonrh commented 3 years ago

Tested again and found something really peculiar. It works as expected when deployed on Vercel. It does not work locally when running next dev nor next build && next start, returns 404 error.

Sample code: https://github.com/jonrh/next-unicode-bugs Sample live website: https://next-unicode-bugs.vercel.app/

Video showing it working on Vercel: https://user-images.githubusercontent.com/58344/103299546-b4da8600-49f4-11eb-8dd9-92ffd8536407.mov

I would also like to clarify that this is only testing static routes, not server side rendering (SSR/SSG) as the title of this issue states.

tyteen4a03 commented 3 years ago

Can trigger this issue with 9.5+. Any ETA on this as I really want to upgrade to React 17 and webpack 5?

andreyshedko commented 3 years ago

If this will help someone, I had fixed this issue the following way:

const res = await fetch(`${process.env.HOST}/api/tags/read`, headers);
  const data = await res.json();
  let paths: { params: ParsedUrlQuery }[] = [];
  if (Array.isArray(data)) {
    paths = data.map((tag: Tag) => ({
      params: { tag: encodeURI(tag.tagName) },
    }));
  }

  return {
    paths,
    fallback: false,
  };
aynik commented 3 years ago

As a workaround I used rewrites on next.config.js:

  async rewrites() {
    return [
      {
        source: `/${encodeURIComponent('カート')}`,
        destination: '/cart',
      },
      {
        source: `/${encodeURIComponent('アカウント')}`,
        destination: '/account',
      },
    ]
  }
JanDez commented 3 years ago

Tested v9.5.0, v9.5.5, and v10.0.1 and none of them support statically generated pages with non-ascii names like /тест and /hæ. It worked as expected in a few versions I tested between v9.0.0 and v9.4.4. I would classify this is a bug or an undocumented breaking change in v9.5.

That happend to my with ñ's ans ´'s words

ShahriarKh commented 3 years ago

for me, decodeURI is the answer:

export async function getStaticPaths() {
   const { posts } = await request(CMS, POSTS);

   const paths = posts.nodes.map((post) => ({
      params: { slug: decodeURI(post.slug) },
   }));

   return { paths, fallback: false };
}
nbouvrette commented 3 years ago

We just released a new package that overcomes this issue (and many others): https://github.com/Avansai/next-multilingual

Looking forward to hearing feedback on our approach.

Tobeyforce commented 2 years ago

While this package shows some promise, shouldn't international urls be supported by default? Internationalization is the concept of supporting multiple languages, which has nothing to do (maybe a little) with UTF-8-based urls.

It looks like this package e.g enforces every url to use a language prefix, e.g /fr/my-international-url I think it's quite simple - international urls should be supported by default.

For example: I want to use the swedish characters å,ä, ö and have a url called /påsk

This doesn't work. However, if I name my page p%C3%A5sk it works.... until I use getStaticPaths, then it breaks. Not to mention ISR revalidation doesn't work either. This really causes a ton of confusion.

Using the approach above with rewrites is also not quite feasable when you got multiple pages using e.g getstaticpaths. Using rewrites messes with packages like next-sitemap.

We just released a new package that overcomes this issue (and many others): https://github.com/Avansai/next-multilingual

Looking forward to hearing feedback on our approach.

rinarakaki commented 1 year ago

I keep getting this error when I go to non-ascii path in the local dev mode (npm run dev), trying to use Dynamic Routes with App Router:

TypeError: Cannot convert argument to a ByteString because the character at index 8 has a value of <value> which is greater than 255.
    at Object.fetch (node:internal/deps/undici/undici:16287:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async invokeRequest (/path/to/node_modules/next/dist/server/lib/server-ipc/invoke-request.js:21:12)
    at async invokeRender (/path/to/node_modules/next/dist/server/lib/router-server.js:229:29)
    at async handleRequest (/path/to/node_modules/next/dist/server/lib/router-server.js:422:24)
    at async requestHandler (/path/to/node_modules/next/dist/server/lib/router-server.js:439:13)
rinarakaki commented 9 months ago

Any updates?

coffeecupjapan commented 8 months ago

I just take a quick look at this problem only , but it seems like nextjs at build runtime replace any non-word characters to blank and therefore you cannot use non-ascii words, I assume.

https://github.com/vercel/next.js/blob/7dbb66f390ab54e4b5c4c632a626cbcd6fd8f271/packages/next/src/shared/lib/router/utils/route-regex.ts#L114-L116

Are any one eagerly want to do add non-ascii (ex. UTF-8) words at least here ( and supposely more, I cannot pick every dependencies.. sorry) ?

https://github.com/vercel/next.js/blob/7dbb66f390ab54e4b5c4c632a626cbcd6fd8f271/packages/next/src/shared/lib/router/utils/route-regex.ts#L81-L97

abdessamadely commented 6 months ago

In my case, I encountered this issue with Arabic pathnames, After debugging a little I noticed that we have a misalignment between dev, and export (on validation I think), as a workaround, I did the following:

process.env.NODE_ENV === 'development' ? encodeURI(page) : page
// or
process.env.NODE_ENV === 'development' ? encodeURI(page) : decodeURI(page)

On dev, I encoded the pathname. So, it would match what the Next server has, but on export/build, I give it the value I want for the generated filename.

Full example:

const pages = ['من-نحن', 'سياسة-الخصوصية', 'الشروط-والأحكام']
export async function generateStaticParams() {
  return pages.map((page) => ({
    pathname: process.env.NODE_ENV === 'development' ? encodeURI(page) : page 
  }))
}
Arctomachine commented 2 days ago

What exactly is main source of this problem? We could come up with solution together and fix it by v15 stable release perhaps