immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
44.56k stars 2.17k forks source link

Immich Web Reports Server "offline" and "version" unknown. #4611

Closed reefland closed 7 months ago

reefland commented 10 months ago

The bug

Not a new install, been running for many months. Immich has been very stable for me. After v1.82.1 upgrade the Web Interface shows: image

However, Immich seems to be fully operational. I can run jobs, view server stats, tested the new Repair option.

The mobile application connects fine and states "Client and Server are up-to-date".

The OS that Immich Server is running on

Ubuntu 22.04

Version of Immich Server

v1.82.1

Version of Immich Mobile App

1.82.0 build.106

Platform with the issue

Your docker-compose.yml content

Not using docker-compose, deployed under Kubernetes (K3s) `v1.27.6+k3s1` using the Kustomize application template base on <https://github.com/onedr0p/home-ops/tree/main/kubernetes/apps/default/immich> minor changes as I deploy it from ArgoCD instead of FluxCD.

Your .env content

Instead of `.env` using a Kubernetes ConfigMap:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: immich-configmap
  namespace: immich
data:
  # Postgres Database
  DB_PORT: "5432"

  # URLS
  # IMMICH_MACHINE_LEARNING_URL: "false"
  IMMICH_MACHINE_LEARNING_URL: http://immich-machine-learning.immich.svc.cluster.local:3003
  IMMICH_SERVER_URL: http://immich-server.immich.svc.cluster.local:3001
  IMMICH_WEB_URL: http://immich-web.immich.svc.cluster.local:3000
  PUBLIC_IMMICH_SERVER_URL: https://photos.[REDACTED]

  LOG_LEVEL: verbose

  # REDIS
  REDIS_URL: ioredis://[REDACTED]
  REDIS_PORT: "6379"

  # TYPESENSE
  TYPESENSE_ENABLED: "true"
  TYPESENSE_HOST: immich-typesense.immich.svc.cluster.local
  TYPESENSE_DATA_DIR: /config
  TYPESENSE_PORT: "8108"
  TYPESENSE_PROTOCOL: http

Reproduction steps

1. Login to Immich web interface
2. Look at bottom left side of screen for server information

Additional information

I understand if Kubernetes is not supported, not looking for support. Just wanted to report this odd issues.

I did see the reference in BREAKING CHANGES to: _We removed a section from the default docker-compose.yml that passed the IMMICH_SERVER_URL and IMMICH_WEB_URL environment variables to immich-proxy. If your setup requires those, make sure you keep them passed through._

As far as I can tell, these are still being passed. From a terminal within the WEB app:

set | grep "immich_*" | grep "URL"

IMMICH_MACHINE_LEARNING_URL='http://immich-machine-learning.immich.svc.cluster.local:3003'
IMMICH_SERVER_URL='http://immich-server.immich.svc.cluster.local:3001'
IMMICH_WEB_URL='http://immich-web.immich.svc.cluster.local:3000'
bo0tzz commented 10 months ago

PUBLIC_IMMICH_SERVER_URL is the address on which the web client tries to reach the server API when running in the browser. You probably have that misconfigured (and most likely you don't need to set it at all).

alextran1502 commented 10 months ago

Does the info show up if you hit Ctrl + F5 to reload the page? This is the hiccup with the new WebSocket implementation that will still need to fix

raisinbear commented 10 months ago

FWIW, I only encounter this in Firefox private tabs. However, reloading the page still shows offline / unknown then.

reefland commented 10 months ago

PUBLIC_IMMICH_SERVER_URL is the address on which the web client tries to reach the server API when running in the browser. You probably have that misconfigured (and most likely you don't need to set it at all).

Agreed. I originally did not have that. As part of troubleshooting, I compared my setup to others using the same Kubernetes template and noticed they had it. Figured I'd try it. I'll remove it as it didn't seem to do anything.

reefland commented 10 months ago

Does the info show up if you hit Ctrl + F5 to reload the page? This is the hiccup with the new WebSocket implementation that will still need to fix

No it did not help. I'm using Firefox 118.0.1 on Linux (not a private window).

I tested in Chrome as well and reproduced the issue.

reefland commented 10 months ago

Looking at developer tools in FireFox, I noticed this:

image

With redactions: image

reefland commented 10 months ago

image

martabal commented 10 months ago

I get the same issue on firefox using Oauth2 : on login everything load properly except the server status / version. Refreshing the page fixes it.

jrasm91 commented 10 months ago

I think there is an open bug for this. Basically, the when websocket connection is created initially, it requires authentication, so it doesn't work when you load the page unauthenticated and then login.

I believe this is a duplicate of #4521

reefland commented 10 months ago

I think there is an open bug for this. Basically, the when websocket connection is created initially, it requires authentication, so it doesn't work when you load the page unauthenticated and then login.

I believe this is a duplicate of #4521

When using Web interface, if I sign-out, and sign-in it still does not work. Tried with Firefox and Chromium.

I haven't found any scenario where I can get the status and version to working correctly after upgrading. I don't think this is a duplicate.

jrasm91 commented 10 months ago

Hmm OK. Can you reproduce this in an incognito tag in chrome? Specifically after logging in, refresh the page and the web socket never gets established?

reefland commented 10 months ago

Here ya go...

image

Dev Tools in Incognito window: image

If I click any of /api/socket.io/?EIO-.... link it opens new tab with:

{"code":1,"message":"Session ID unknown"}
jrasm91 commented 10 months ago

A few things:

IMO, we should focus on why the initial websocket connection is upgraded.

I think we will need more information or help to figure out where the "connection refused" error is originating and why. We can probably add some addition logging in the immich-server to help see what connections come in and more details about any error situations.

reefland commented 10 months ago

On the immich-server side (I have 2 instances), neither report anything which looks like errors, logs look like:

[Nest] 7  - 10/23/2023, 5:55:57 PM     LOG [CommunicationRepository] New websocket connection: eOfTUucXVI8Z3tITABCE
[Nest] 7  - 10/23/2023, 5:55:57 PM     LOG [CommunicationRepository] Client eOfTUucXVI8Z3tITABCE disconnected from Websocket
[Nest] 7  - 10/23/2023, 5:56:07 PM     LOG [CommunicationRepository] New websocket connection: lOi8qzFPVNJ7g9tSABCK
[Nest] 7  - 10/23/2023, 5:56:07 PM     LOG [CommunicationRepository] Client lOi8qzFPVNJ7g9tSABCK disconnected from Websocket
[Nest] 7  - 10/23/2023, 5:56:27 PM     LOG [CommunicationRepository] New websocket connection: iuDuR3hTtI8sUXKDABCV
[Nest] 7  - 10/23/2023, 5:56:27 PM     LOG [CommunicationRepository] Client iuDuR3hTtI8sUXKDABCV disconnected from Websocket
[Nest] 7  - 10/23/2023, 5:58:27 PM     LOG [CommunicationRepository] New websocket connection: 1JVyt10Rmco9AAh5ABDZ
[Nest] 7  - 10/23/2023, 5:59:12 PM     LOG [CommunicationRepository] Client 1JVyt10Rmco9AAh5ABDZ disconnected from Websocket
[Nest] 7  - 10/23/2023, 5:59:49 PM     LOG [CommunicationRepository] New websocket connection: sa706Y51FGdirEN6ABEa
[Nest] 7  - 10/23/2023, 5:59:50 PM     LOG [CommunicationRepository] Client sa706Y51FGdirEN6ABEa disconnected from Websocket
[Nest] 7  - 10/23/2023, 6:00:07 PM     LOG [CommunicationRepository] New websocket connection: dFRkCLDXbB31GqkjABEn
[Nest] 7  - 10/23/2023, 6:00:52 PM     LOG [CommunicationRepository] Client dFRkCLDXbB31GqkjABEn disconnected from Websocket
[Nest] 7  - 10/23/2023, 6:01:22 PM     LOG [CommunicationRepository] New websocket connection: NPASRmV_65-CESeYABFi
[Nest] 7  - 10/23/2023, 6:02:07 PM     LOG [CommunicationRepository] Client NPASRmV_65-CESeYABFi disconnected from Websocket
[Nest] 7  - 10/23/2023, 6:02:56 PM     LOG [CommunicationRepository] New websocket connection: L4tf15C4eKz9BuIhABGh
[Nest] 7  - 10/23/2023, 6:03:18 PM     LOG [CommunicationRepository] New websocket connection: acvUIdhzix6dUTc0ABG3
[Nest] 7  - 10/23/2023, 6:03:18 PM     LOG [CommunicationRepository] Client acvUIdhzix6dUTc0ABG3 disconnected from Websocket
[Nest] 7  - 10/23/2023, 6:03:41 PM     LOG [CommunicationRepository] Client L4tf15C4eKz9BuIhABGh disconnected from Websocket
[Nest] 7  - 10/23/2023, 6:04:32 PM     LOG [CommunicationRepository] New websocket connection: _1nvYFXoGRlMX8twABHs
[Nest] 7  - 10/23/2023, 6:04:33 PM     LOG [CommunicationRepository] Client _1nvYFXoGRlMX8twABHs disconnected from Websocket
[Nest] 7  - 10/23/2023, 6:04:40 PM     LOG [CommunicationRepository] New websocket connection: J5WBIZwLP6GI435jABH0
[Nest] 7  - 10/23/2023, 6:05:25 PM     LOG [CommunicationRepository] Client J5WBIZwLP6GI435jABH0 disconnected from Websocket
[Nest] 7  - 10/23/2023, 6:05:53 PM     LOG [CommunicationRepository] New websocket connection: f6FXhSygxZjzQmcmABIh
[Nest] 7  - 10/23/2023, 6:06:03 PM     LOG [CommunicationRepository] New websocket connection: b7zelzXFDZaCyUFrABIv
[Nest] 7  - 10/23/2023, 6:06:38 PM     LOG [CommunicationRepository] Client f6FXhSygxZjzQmcmABIh disconnected from Websocket
[Nest] 7  - 10/23/2023, 6:06:46 PM     LOG [CommunicationRepository] New websocket connection: 9ReaSAs4YZckStIWABJW
[Nest] 7  - 10/23/2023, 6:06:46 PM     LOG [CommunicationRepository] Client 9ReaSAs4YZckStIWABJW disconnected from Websocket
[Nest] 7  - 10/23/2023, 6:06:48 PM     LOG [CommunicationRepository] Client b7zelzXFDZaCyUFrABIv disconnected from Websocket
[Nest] 7  - 10/23/2023, 6:07:12 PM     LOG [CommunicationRepository] New websocket connection: 2PgemiKbfBGASK6eABJs
jrasm91 commented 10 months ago

Right, I don't think we currently log anything on connection errors.

image

JM-Lemmi commented 10 months ago

The issue for me even persists after deleting all the volumes and redownloading the current docker-compose.yml and .env from the installation instructions. For testing purposes even without changing the typesense and postgres passwords. So this should be a completely new setup, but the problem is the same.

alextran1502 commented 10 months ago

@JM-Lemmi the fix has not been released yet

JM-Lemmi commented 10 months ago

Right. I wanted to add more information to the troubleshooting. The original issue mentioned an existing setup being upgraded, but for me it also happened on a new setup. I'd be happy to assist in troubleshooting more.

jrasm91 commented 10 months ago

There is some additional error logging on the server for websocket connections that will be in the next release. Hopefully that will help determine the cause of the issue.

ThatCoffeeGuy commented 10 months ago

This also happens when you're running multiple instances of immich and have them open in the same browser session. (running one at port ending 3, the other with 4.)

reefland commented 10 months ago

There is some additional error logging on the server for websocket connections that will be in the next release. Hopefully that will help determine the cause of the issue.

Has this additional logging been added? Upgraded to v1.84.0, the logging looks different, but still no error messages.

[Nest] 7  - 11/06/2023, 3:05:07 AM     LOG [NestApplication] Nest application successfully started +5ms
[Nest] 7  - 11/06/2023, 3:05:07 AM     LOG [ImmichServer] Immich Server is listening on http://[::1]:3001 [v1.84.0] [PRODUCTION] 
[Nest] 7  - 11/06/2023, 3:06:52 AM     LOG [CommunicationRepository] Websocket Connect:    D4tMKY7GY65bLVLfAAAB
[Nest] 7  - 11/06/2023, 3:07:37 AM     LOG [CommunicationRepository] Websocket Disconnect: D4tMKY7GY65bLVLfAAAB
[Nest] 7  - 11/06/2023, 3:09:07 AM     LOG [CommunicationRepository] Websocket Connect:    G_GiTIpv3B--PCSRAAAr
[Nest] 7  - 11/06/2023, 3:09:52 AM     LOG [CommunicationRepository] Websocket Disconnect: G_GiTIpv3B--PCSRAAAr
[Nest] 7  - 11/06/2023, 3:10:33 AM     LOG [CommunicationRepository] Websocket Connect:    blNGSLGB224-MOPPAABU
[Nest] 7  - 11/06/2023, 3:10:44 AM     LOG [CommunicationRepository] Websocket Connect:    qBjqkeHpLQYUVdDlAABZ
[Nest] 7  - 11/06/2023, 3:10:54 AM     LOG [CommunicationRepository] Websocket Connect:    fQkJGvNwl7QqzmGdAABb
[Nest] 7  - 11/06/2023, 3:10:55 AM     LOG [CommunicationRepository] Websocket Disconnect: fQkJGvNwl7QqzmGdAABb
[Nest] 7  - 11/06/2023, 3:11:14 AM     LOG [CommunicationRepository] Websocket Connect:    pAzXfQY_PPUX0nS3AABh
[Nest] 7  - 11/06/2023, 3:11:15 AM     LOG [CommunicationRepository] Websocket Disconnect: pAzXfQY_PPUX0nS3AABh
[Nest] 7  - 11/06/2023, 3:11:18 AM     LOG [CommunicationRepository] Websocket Disconnect: blNGSLGB224-MOPPAABU
[Nest] 7  - 11/06/2023, 3:11:29 AM     LOG [CommunicationRepository] Websocket Disconnect: qBjqkeHpLQYUVdDlAABZ
[Nest] 7  - 11/06/2023, 3:11:52 AM     LOG [CommunicationRepository] Websocket Connect:    NRTZ08L7g6ANQMX9AABs
[Nest] 7  - 11/06/2023, 3:12:18 AM     LOG [CommunicationRepository] Websocket Connect:    H0qbsFvndx8OGPXjAABw
[Nest] 7  - 11/06/2023, 3:12:19 AM     LOG [CommunicationRepository] Websocket Disconnect: H0qbsFvndx8OGPXjAABw
[Nest] 7  - 11/06/2023, 3:12:37 AM     LOG [CommunicationRepository] Websocket Disconnect: NRTZ08L7g6ANQMX9AABs
[Nest] 7  - 11/06/2023, 3:13:23 AM     LOG [CommunicationRepository] Websocket Connect:    YD0kE2mYxlGoQOxcAACO
[Nest] 7  - 11/06/2023, 3:14:08 AM     LOG [CommunicationRepository] Websocket Disconnect: YD0kE2mYxlGoQOxcAACO
jrasm91 commented 10 months ago

@reefland it has. The fact you are not seeing anything indicates that the connection itself is not making it to the server due to a networking issue or similar. Are websocket connections allowed at all levels of routing?

imtoanle commented 10 months ago

@reefland If you are using nginx load balancer then that's problem between socketio and nginx ingress. Just add configuration-snippet to nginx. https://socket.io/docs/v4/using-multiple-nodes/#nginx-ingress-kubernetes

reefland commented 10 months ago

@imtoanle - thanks for that. I'm using Traefik Ingress on Kubernetes, they don't have an example for that one. They have one for use with docker which is just sticky sessions. The labels listed have equivalent Kubernetes annotations in the Traefik Docs

I enabled the two annotations on the API Service (not the API Ingress!) and it worked:

service:
  main:
    annotations:
      traefik.ingress.kubernetes.io/service.sticky.cookie.httponly: "true"
      traefik.ingress.kubernetes.io/service.sticky.cookie.name: "server_id"
    ports:
      http:
        port: 3001

image

Service when viewed in Traefik Dashboard will show the sticky cookie: image

So switching to websockets is going to give plenty of implementation specific headaches.

imtoanle commented 10 months ago

@reefland I'd be glad to help. I also lost a lot of time on this issue, this should be put in the document to save time for others.

reefland commented 10 months ago

Maybe we need to wait and see what the Dev plans are with expansion of websockets. A solution which requires sticky sessions to be enabled raises questions. Normally API load would be round-robin between multiple instances, across multiple nodes in my cluster via the service. I have concerns with sticky sessions now getting in the way of that.

Hopefully someone more knowledgeable about Immich can chime in.

jrasm91 commented 10 months ago

Maybe we need to wait and see what the Dev plans are with expansion of websockets. A solution which requires sticky sessions to be enabled raises questions. Normally API load would be round-robin between multiple instances, across multiple nodes in my cluster via the service. I have concerns with sticky sessions now getting in the way of that.

Hopefully someone more knowledgeable about Immich can chime in.

Sticky sessions are only required if your web servers don't support upgrading http connection to web sockets.

reefland commented 10 months ago

But didn't work unless sticky session was applied to the Kubernetes Service that sits between immich-server and Traefik.

Client Browser -> Kube-VIP NLB -> 3 instances of Traefik [SSL Terminates here] -> Kubernetes Service [sticky session annotation here] -> 2 instances of immich-server.

Does it imply the Kubernetes Service does not support websockets?

jrasm91 commented 10 months ago

Correct. If you look at the network requests in the web, it tries to upgrade initially and if that fails, it falls back to polling. Pulling requires sticky sessions for sure.

I would guess that is what is happening which means something in the chain is preventing the websocket connection.

bo0tzz commented 10 months ago

Does it imply the Kubernetes Service does not support websockets?

Basic Kubernetes services are simple enough that it's not a question of whether they support things (so they do support websockets). It must be something else in the path.

conneryn commented 9 months ago

Correct. If you look at the network requests in the web, it tries to upgrade initially and if that fails, it falls back to polling. Pulling requires sticky sessions for sure.

I would guess that is what is happening which means something in the chain is preventing the websocket connection.

By default, socket.io actually starts with polling, then attempts to upgrade.

Their reason:

While WebSocket is clearly the best way to establish a bidirectional communication, experience has shown that it is not always possible to establish a WebSocket connection, due to corporate proxies, personal firewall, antivirus software...

From the user perspective, an unsuccessful WebSocket connection can translate in up to at least 10 seconds of waiting for the realtime application to begin exchanging data. This perceptively hurts user experience.

Ref: https://socket.io/docs/v3/how-it-works/#upgrade-mechanism

Unfortunately, this default behaviour seems to cause a challenge upgrading to Websockets when using multiple nodes. Their documentation, strongly suggests to either enable sticky sessions or explicitly disable "polling" completely.

That said, they also mention that we can change the default order:

const socket = io("https://example.com", {
  transports: ["websocket", "polling"] // use WebSocket first, if available
});

socket.on("connect_error", () => {
  // revert to classic upgrade
  socket.io.opts.transports = ["polling", "websocket"];
});

I am not sure why this isn't their preferred suggestion, as I can't see how this would be any worse of a user experience than disabling polling altogether, but there may be something I am unaware of?

Either way, I, personally, want to avoid sticky sessions and am happy to have a degraded experience for edge-case connections that still don't support Websockets, so I am very in favour of either disabling polling altogether, or changing the order... but I understand this may not be a shared view. I wonder if it would be possible to make this configurable?

P.S.: is anyone else amused by the line "up to at least 10 seconds"? 😆

jrasm91 commented 9 months ago

Oh interesting. Maybe we can try disabling polling, at least initially.

Emanuel-Bjurhager commented 7 months ago

I updated to version v1.94.1 from version v1.93.3 the other day and this issue is still there. I have not had any time to research this issue but this is what I know so far:

nirjhar commented 7 months ago

I updated to version v1.94.1 from version v1.93.3 the other day and this issue is still there. I have not had any time to research this issue but this is what I know so far:

  • I have only experienced this issue immediately after updating immich
  • I use pretty much the default docker compose setup as described in the wiki
  • I do not use a reverse proxy or similar. No nginx, no traefik. I only use http over LAN.
  • Everything works fine. I can use immich CLI, web is responsive as normal, AI stuff works etc. The only issue is that the server is marked as offline and version unknown. There is literally no other issues that I have found so this issue seem to be only visual.
  • Since my setup is about 1 week old, and is very similar to the setup in the wiki. I would believe that this should be somewhat easy to replicate. I use ubuntu server 20 LTS. I have around 15,000 assets in the library

same here.

jrasm91 commented 7 months ago

What browser are you using?

Emanuel-Bjurhager commented 7 months ago

What browser are you using?

Latest version of Brave. I have noticed during the past days that sometimes the issue is not there, then it comes back sometimes.

jrasm91 commented 7 months ago

Does it work in another browser?

xopek-by commented 7 months ago

Same here. I checked in three browsers - Safari, Edge and Chrome. In incognito mode as well. It appeared after updating to 1.94.1

jrasm91 commented 7 months ago

Starting in 1.94 the status/version info box requires a websocket connection. This must be enabled in your reverse proxy. Also, Brave browser blocks websockets when using "Brave Browser Shield".

xopek-by commented 7 months ago

Starting in 1.94 the status/version info box requires a websocket connection. This must be enabled in your reverse proxy. Also, Brave browser blocks websockets when using "Brave Browser Shield".

Oh, thank you so much. I read the release notes, but apparently I was either too blind or too stupid to see it at the time. Allowed websocket to reverse proxy and it worked.

9k001 commented 6 months ago

Starting in 1.94 the status/version info box requires a websocket connection. This must be enabled in your reverse proxy. Also, Brave browser blocks websockets when using "Brave Browser Shield".

I solved this problem a few days ago by using nginx to add support for Websocket, but it reappeared after upgrading to 1.97.0 today. In addition, I want to know whether this state is a link between the server side and the DB side or between the microservices side and the DB side.

jrasm91 commented 6 months ago

This is between the web and the server. The database and microservices are not involved.

9k001 commented 6 months ago

This is between the web and the server. The database and microservices are not involved.

Thanks for your help, I solved the problem. This link is between WebUI and Server, and I solved this problem after adding websocket support to Nginx.