vercel / next.js

The React Framework
https://nextjs.org
MIT License
123.5k stars 26.35k forks source link

[NEXT-481] preview doesn't work when there are more than 1 nextjs instance running #39294

Open mikalai-t opened 1 year ago

mikalai-t commented 1 year ago

Verify canary release

Provide environment information

npx --no-install next info

Operating System:
  Platform: linux
  Arch: x64
  Version: #1 SMP Wed Jul 13 21:34:30 UTC 2022
Binaries:
  Node: 16.16.0
  npm: 8.11.0
  Yarn: 1.22.19
  pnpm: N/A
Relevant packages:
  next: 12.2.4-canary.9
  eslint-config-next: 12.2.0
  react: 17.0.2
  react-dom: 17.0.2

warn - Latest canary version not detected, detected: "12.2.4-canary.9", newest: "12.2.4-canary.10". Please try the latest canary version (npm install next@canary) to confirm the issue still exists before creating a new issue. Read more - https://nextjs.org/docs/messages/opening-an-issue

What browser are you using? (if relevant)

Google Chrome 103.0.5060.134 (Official Build) (64-bit)

How are you deploying your application? (if relevant)

Kubernetes (AWS EKS) running node:16-alpine Docker image with "next start" as main process (pid 1)

Describe the Bug

When I have more than 1 instance of Docker container running, "preview" feature intermittently fails and renders usual web-page (last published in Sanity) because of the 2nd instance invalidating the "preview"-cookies by setting them to empty value and expiration field in the past (see attached diagram).

preview issue

It works when there is exactly 1 instance of the application running in cluster. BUILD_ID is the same for both instance, "preview secret" string is the same (shared via Kubernetes Secret) and using this string it's possible to validate signature of the token stored in __next_preview_data using jwt.io

Expected Behavior

Several instances shouldn't clear cookies set by another instance.

Link to reproduction

n/a

To Reproduce

  1. enable preview feature in Sanity Studio and deploy it
  2. create simple Nextjs application connected to Sanity CMS
  3. deploy multiple Nextjs instances on-premise infrastructure behind the reverse-proxy (a load-balancer)
  4. in Sanity Studio try to switch from "editor" to "preview" and watch the requests/responses/cookies

NEXT-481

jankaifer commented 1 year ago

Have you managed to fix/get around this issue?

github-actions[bot] commented 1 year ago

Please verify that your issue can be recreated with next@canary.

Why was this issue marked with the please verify canary label?

We noticed the provided reproduction was using an older version of Next.js, instead of canary.

The canary version of Next.js ships daily and includes all features and fixes that have not been released to the stable version yet. You can think of canary as a public beta. Some issues may already be fixed in the canary version, so please verify that your issue reproduces by running npm install next@canary and test it in your project, using your reproduction steps.

If the issue does not reproduce with the canary version, then it has already been fixed and this issue can be closed.

How can I quickly verify if my issue has been fixed in canary?

The safest way is to install next@canary in your project and test it, but you can also search through closed Next.js issues for duplicates or check the Next.js releases. You can also use the GitHub template (preferred), or the CodeSandbox or StackBlitz templates to create a reproduction with canary from scratch.

My issue has been open for a long time, why do I need to verify canary now?

Next.js does not backport bug fixes to older versions of Next.js. Instead, we are trying to introduce only a minimal amount of breaking changes between major releases.

What happens if I don't verify against the canary version of Next.js?

An issue with the please verify canary that receives no meaningful activity (e.g. new comments that acknowledge verification against canary) will be automatically closed and locked after 30 days.

If your issue has not been resolved in that time and it has been closed/locked, please open a new issue, with the required reproduction, using next@canary.

I did not open this issue, but it is relevant to me, what can I do to help?

Anyone experiencing the same issue is welcome to provide a minimal reproduction following the above steps. Furthermore, you can upvote the issue using the :+1: reaction on the topmost comment (please do not comment "I have the same issue" without repro steps). Then, we can sort issues by votes to prioritize.

I think my reproduction is good enough, why aren't you looking into it quicker?

We look into every Next.js issue and constantly monitor open issues for new comments.

However, sometimes we might miss one or two due to the popularity/high traffic of the repository. We apologize, and kindly ask you to refrain from tagging core maintainers, as that will usually not result in increased priority.

Upvoting issues to show your interest will help us prioritize and address them as quickly as possible. That said, every issue is important to us, and if an issue gets closed by accident, we encourage you to open a new one linking to the old issue and we will look into it.

Useful Resources

mikalai-t commented 1 year ago

@JanKaifer Thank you for the response. I have, well it's not a fix, but workaround - sticky sessions based on source IP address to redirect the user exactly to the same Docker instance he has visited first. It slightly breaks load-balancing, but I don't care until there are millions of request per second which seems to be unreachable in the near future.

Multiply commented 1 year ago

We're running with many deployments (pods/containers) just fine, it's only a problem between releases (different next build).

It is however super annoying that the default response is a 500 error, so all content creators are kicked off with every release.

mikalai-t commented 1 year ago

@Multiply Hmm... wouldn't you mind to share your versions. To get a consistent build id we've referred to https://nextjs.org/docs/api-reference/next.config.js/configuring-the-build-id , but certainly it's not about "between releases" just to get the same build ID in all the Docker containers running the current version.

Also we didn't notice any 500 error appearing right after new version is deployed.

And just to clarify - when you said "we're running just fine" - you were talking about the Next.js preview feature, right? It works "out-of-box" for you in your containerized environment? Could you briefly describe it, without providing sensitive info: what type of CI/build system? runtime environment: eks, aks, gce or any kind of on-premise?

Multiply commented 1 year ago

Are you building on boot of the container? We build and publish containers, and run the same sha's across all containers, with the exception of when a new release is rolling out.

cruno91 commented 6 months ago

@jankaifer Thank you for the response. I have, well it's not a fix, but workaround - sticky sessions based on source IP address to redirect the user exactly to the same Docker instance he has visited first. It slightly breaks load-balancing, but I don't care until there are millions of request per second which seems to be unreachable in the near future.

Hi @mikalai-t sorry to dig up an older thread, but do you have documentation on how to implement sticky sessions? We're experiencing the same issue in a multi-pod Kubernetes setup.

mikalai-t commented 6 months ago

@cruno91 It depends on the networking solution, in our case this is Istio (Gateway + VirtualService). I believe for Nginx Ingress Controller, or AWS LoadBalancer controller there will be different settings...