vercel / next.js

The React Framework
https://nextjs.org
MIT License
122.62k stars 26.25k forks source link

fix: _rsc should be unique per server build #67229

Open snyamathi opened 4 days ago

snyamathi commented 4 days ago

This PR adds the currentBuildId to the uniqueCacheQuery which is sent as the _rsc query parameter for RSC payload requests.

tl;dr is that the promise of _rsc as a "unique cache query" isn't true across builds, causing client side navigation to break.

The _rsc query parameter is a function of 3 values, none of which are unique per-build.

const uniqueCacheQuery = hexHash(
  [
    headers[NEXT_ROUTER_PREFETCH_HEADER] || '0', // "0" | "1"
    headers[NEXT_ROUTER_STATE_TREE], // ["",{"children":["__PAGE__",{},"/","refresh"]},null,null,true]
    headers[NEXT_URL], // "/"
  ].join(',')
)

// ...

// Add unique cache query to avoid caching conflicts on CDN which don't respect to Vary header
fetchUrl.searchParams.set(NEXT_RSC_UNION_QUERY, uniqueCacheQuery)

We're using a full CI/CD pipeline with frequent blue/green deployments, each shifting traffic over a period of time.

Because of this, it is possible for a page served from the blue deployment to make a RSC payload request that is served by the green deployment (or vice-versa).

When this happens, there is a mis-match between the browser's currentBuildId and the response's buildId and we hit a condition that does a server side navigation.

When mpaNavigation flag is set do a hard navigation to the new url

    if (currentBuildId !== buildId) {
      return doMpaNavigation(res.url)
    }

This is good because it prevents the page from breaking in insidious ways, but if we can avoid this by getting the correct RSC payload, that would be best.

There are options like ALB Stickiness which use a cookie to route to the requests to the same target, but this breaks when caching is involved.

The Vary header sent (like the _rsc query param) will be the same for both the blue and green deployments. Whichever deployment generated the cache page first will poison the cache for any other deployments where the same state tree exists. A lower cache TTL will mitigate this, but only to an extent and with drawbacks as it's lowered.

For this reason, adding in the build ID to the _rsc parameter allows it to function as a pre-build cache bust, ensuring we don't kill client side routing unnecessarily.

Response Headers

Vary: RSC, Next-Router-State-Tree, Next-Router-Prefetch, Accept-Encoding

Request Headers

Rsc: 1
Next-Router-State-Tree: %5B%22%22%2C%7B%22children%22%3A%5B%22__PAGE__%22%2C%7B%7D%2C%22%2F%22%2C%22refresh%22%5D%7D%2Cnull%2Cnull%2Ctrue%5D
Next-Router-Prefetch: 1
Accept-Encoding: gzip, deflate

This PR is linked with, but does not fully solve https://github.com/vercel/next.js/issues/59986

Slightly related to https://github.com/vercel/next.js/discussions/59167

ijjk commented 4 days ago

Allow CI Workflow Run

Note: this should only be enabled once the PR is ready to go and can only be enabled by a maintainer