vercel / next.js

The React Framework
https://nextjs.org
MIT License
125.49k stars 26.81k forks source link

RSC and CDN interaction makes next.js inefficient for highload projects #65335

Open dankain opened 5 months ago

dankain commented 5 months ago

Link to the code that reproduces this issue

https://codesandbox.io/p/devbox/rsc-test-m43xq4

To Reproduce

  1. Build the application next build
  2. Start the application next start
  3. Navigate to category 1
  4. Navigate to category 2 with hard refresh

Current vs. Expected behavior

Current Behavior

Screenshot 2024-05-03 at 15 07 20

Product 1 appears in category 1 and category 2

Currently they return return identical data, but have a different rsc hash

Expected Behaviour

If the data is the same there should be only one rsc hash.

With a high throughput global ecommerce site I want to cache identical data close to the customer in a CDN. The different rsc hashes mean that I will get CDN cache misses and traffic will have to go back to the server, potentially a distance from the customer and with a slower response time.

Provide environment information

Operating System:
  Platform: darwin
  Arch: arm64
  Version: Darwin Kernel Version 23.4.0: Fri Mar 15 00:10:42 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6000
  Available memory (MB): 65536
  Available CPU cores: 10
Binaries:
  Node: 18.18.0
  npm: 10.1.0
  Yarn: 1.22.19
  pnpm: N/A
Relevant Packages:
  next: 14.2.3 // Latest available version is detected (14.2.3).
  eslint-config-next: 14.1.0
  react: 18.3.1
  react-dom: 18.3.1
  typescript: 5.4.5
Next.js Config:
  output: N/A

Which area(s) are affected? (Select all that apply)

Performance

Which stage(s) are affected? (Select all that apply)

next start (local)

Additional context

Hi I'm trying to work out how I make the RSC requests work with a CDN when self hosting Nextjs. Is there any more info on this subject. I'm working on an ecommerce site with 100,000 products. Those products could appear in numerous product listing pages (search result pages). Each of those PLP pages are given unique URLs for SEO purposes. Take for example the following URLs:

/mens/ /mens/trainers /mens/trainers/brand /mens/trainers/brand?facet-price=%3A168

They could all have the same products in. Due to the way the rsc hash is calculated that means I get a different _rsc params on each listing page, even though the contents of that response is exactly the same.

product/299336/?_rsc=1vl30 product/299336/?_rsc=qe3go product/299336/?_rsc=1vg99 product/299336/?_rsc=1stsw

I have even gone to different areas of the site cart, checkout, order history, all with links back to the same product, they each produce different _rsc params, but still the data returned is identical?

On a high throughput site I want to be able to cache identical data in the CDN close to the customer. At the moment that would be impossible as there would be to many variations on the rsc hash to make the caching effective.

For solutions I think I only have 2 options:

Cache all rsc requests in the CDN - this would end up caching loads of duplicate data and get cache misses when they should be hits Pass all request through to the Nextjs server. - With this solution I would worry the server would be overloaded at peak periods. In both cases there would be an extra cost to the client

I'm trying to understand why the rsc has is different when it is always returning identical data? What is the purpose of this hash? As mentioned before could this just be set to _rsc=1? We would also have issues with the Vary header as it is currently returned like this:

Vary: RSC, Next-Router-State-Tree, Next-Router-Prefetch, Next-Url

In this case the Next-Router-State-Tree and Next-Url will be set on where you are coming from and do not necessarily have an impact on the data need for the page we are going to. The Vary header will again have an impact on the CDN

This issue has been rasied in the following discussion thread https://github.com/vercel/next.js/discussions/59167

NEXT-3327

samcx commented 5 months ago

@dankain This is what I get on the latest canary (after doing these exact steps).

  1. Load in next start
  2. Client-side navigate to category 1
  3. Navigate to /category/2

Can you confirm if you are seeing the same on the latest canary? →

CleanShot 2024-05-06 at 13 41 09@2x

dankain commented 5 months ago

Yes I have tried on the latest canary and it is the same. You need to go to category 1 and then category 2. Above I just see the navigation to category 2. If you go to both categories you will see there are 2 different rsc requests for the same product data

dankain commented 5 months ago

Hi @samcx I have created a video to explain

https://github.com/vercel/next.js/assets/37213006/16e91eb0-947b-4634-8657-33162ded887f

RedVelocity commented 4 months ago

This is causing issues for me as well, the random rsc string in the end always results in CF CDN cache MISS image

ztanner commented 4 months ago

I'm trying to understand why the rsc has is different when it is always returning identical data?

In this case the Next-Router-State-Tree and Next-Url will be set on where you are coming from and do not necessarily have an impact on the data need for the page we are going to. The Vary header will again have an impact on the CDN

For some context, the ?_rsc hash is mean to mirror the Vary header. It was added because there are CDNs that don't honor the Vary header (example). If these headers don't change, the query parameter will be the same, signaling that given the same request headers, the response will be the same.

The reason the Vary header exists is because these headers actually can change the response from the server. For example:

This is obviously not a solution to the issue you're describing, but I wanted to provide some color as to why app router is doing this. If those responses are cached, and the RSC data for a page returns a tree that corresponds with the request from a different page, things will start behaving incorrectly. For example, see:

dankain commented 4 months ago

Thanks @ztanner , that helps with the context. Partial rendering is a great concept, but if it means I can't use a CDN effectively then that is an issue.

The problem with the next-router-state-tree is it is only something that the server can understand. The state trees for each of my categories and products are all different, only the server knows that they each have the same layout, therefore the transition from category 1 to product 1 or category 2 to product 1 would need identical data, even with partial rendering. Would it be possible to have the same vary header and RSC hash when the data is the same? It would require the client to understand the layouts?

If this is not possible do you have a suggestion on how to deploy next for a global site? Our site is hosted in Europe, but has southern hemisphere customers. Do I now need to somehow push the page building, HTML cache and data cache to an edge location?

wit221 commented 3 months ago

Experiencing the same issue, wherein identical RSCs produce different hashes when being Link-ed to from two different paths.

Without diving into the details of ?_rsc implementation and limitations, it sounds like a core blocker to achieving the above goal is the fact that both the RSC data and the RSC tree layout information are coupled together within one ?_rsc payload?

If so, is there a world where we decouple them and have, say, a ?_rsc_data and a ?_rsc_tree payload?

On a high level:

The two paths fetch the same ?_rsc_data=hash_a payload (and it can be cached across paths), but they still fetch different ?_rsc_tree=hash_{b|c}(and it can be cached per path), since the tree infos may be different for each path.

The goal would be to achieve caching of the data part of the _rsc payload across all paths (it's the more expensive one, presumably), and then the tree part of the _rsc payload can be cached per path.

1kuzus commented 3 months ago

Are there any further fixes?

Edit by maintainer bot: Comment was automatically minimized because it was considered unhelpful. (If you think this was by mistake, let us know). Please only comment if it adds context to the issue. If you want to express that you have the same problem, use the upvote 👍 on the issue description or subscribe to the issue for updates. Thanks!

inderjotx commented 1 month ago

Caching is crucial ,at this point I am moving to Astro . F**k Vercel.

Systemcluster commented 1 month ago

the ?_rsc hash is meant to mirror the Vary header. It was added because there are CDNs that don't honor the Vary header

As a cache-busting workaround for a subset of broken third parties, I don't see a good reason not to offer an option to disable it.

guillaume-fr commented 1 month ago

With proper CDN/proxy configuration, one should already be able to ignore ?_rsc in CDN cache key and/or include relevant headers. Unfortunately you still have same cache dilution issue or you are breaking everything if you don't include _rsc neither honor Vary header. To actually fix this issue, the framework must allow good cache hit ratio on external cache.

Maybe changing request flow and cache header logic (Vary...) to allow a decent cache hit ratio by default on CDN, proxy and browser. Or maybe documenting cache processing/response construction on Next-URL, next-router-state-tree... so a third party can re-implement that custom logic. If data are required for that processing, Next.js should provide a convenient way to export them (API, build time output...).