immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
52.02k stars 2.76k forks source link

[BUG] Machine learning memory leak #3142

Closed rafsko1 closed 1 year ago

rafsko1 commented 1 year ago

The bug

yacht dashboar is showing that immich_machine_learning is consuming between 20%-60% of ram

The OS that Immich Server is running on

Debian

Version of Immich Server

v1.66.1

Version of Immich Mobile App

v1.66.0

Platform with the issue

Your docker-compose.yml content

version: "3.8"

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "immich" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "microservices" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always

  immich-web:
    container_name: immich_web
    image: ghcr.io/immich-app/immich-web:${IMMICH_VERSION:-release}
    env_file:
      - .env
    restart: always

  typesense:
    container_name: immich_typesense
    image: typesense/typesense:0.24.1@sha256:9bcff2b829f12074426ca044b56160ca9d777a0c488303469143dd9f8259d4dd
    environment:
      - TYPESENSE_API_KEY=${TYPESENSE_API_KEY}
      - TYPESENSE_DATA_DIR=/data
    logging:
      driver: none
    volumes:
      - tsdata:/data
    restart: always

  redis:
    container_name: immich_redis
    image: redis:6.2-alpine@sha256:70a7a5b641117670beae0d80658430853896b5ef269ccf00d1827427e3263fa3
    restart: always

  database:
    container_name: immich_postgres
    image: postgres:14-alpine@sha256:28407a9961e76f2d285dc6991e8e48893503cc3836a4755bbc2d40bcc272a441
    env_file:
      - .env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      PG_DATA: /var/lib/postgresql/data
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: always

  immich-proxy:
    container_name: immich_proxy
    image: ghcr.io/immich-app/immich-proxy:${IMMICH_VERSION:-release}
    environment:
      # Make sure these values get passed through from the env file
      - IMMICH_SERVER_URL
      - IMMICH_WEB_URL
    ports:
      - 2283:8080
    depends_on:
      - immich-server
      - immich-web
    restart: always

volumes:
  pgdata:
  model-cache:
  tsdata:

Your .env content

###################################################################################
# Database
###################################################################################

DB_HOSTNAME=immich_postgres
DB_USERNAME=postgres
DB_PASSWORD=postgres
DB_DATABASE_NAME=immich

# Optional Database settings:
# DB_PORT=5432

###################################################################################
# Redis
###################################################################################

REDIS_HOSTNAME=immich_redis

# Optional Redis settings:

# Note: these parameters are not automatically passed to the Redis Container
# to do so, please edit the docker-compose.yml file as well. Redis is not configured
# via environment variables, only redis.conf or the command line

# REDIS_PORT=6379
# REDIS_DBINDEX=0
# REDIS_PASSWORD=
# REDIS_SOCKET=

###################################################################################
# Upload File Location
#
# This is the location where uploaded files are stored.
###################################################################################

UPLOAD_LOCATION=/srv/dev-disk-by-uuid-73d1767c-8c6f-4fd0-bf8b-da093241cda3/Backup/Immich/

###################################################################################
# Typesense
###################################################################################
TYPESENSE_API_KEY=blablablahq1q1q1!!!
# TYPESENSE_ENABLED=false

###################################################################################
# Reverse Geocoding
#
# Reverse geocoding is done locally which has a small impact on memory usage
# This memory usage can be altered by changing the REVERSE_GEOCODING_PRECISION variable
# This ranges from 0-3 with 3 being the most precise
# 3 - Cities > 500 population: ~200MB RAM
# 2 - Cities > 1000 population: ~150MB RAM
# 1 - Cities > 5000 population: ~80MB RAM
# 0 - Cities > 15000 population: ~40MB RAM
####################################################################################

# DISABLE_REVERSE_GEOCODING=false
# REVERSE_GEOCODING_PRECISION=3

####################################################################################
# WEB - Optional
#
# Custom message on the login page, should be written in HTML form.
# For example:
# PUBLIC_LOGIN_PAGE_MESSAGE="This is a demo instance of Immich.<br><br>Email: <i>demo@demo.de</i><br>Password: <i>demo</i>"
####################################################################################

PUBLIC_LOGIN_PAGE_MESSAGE=Hello!

####################################################################################
# Alternative Service Addresses - Optional
#
# This is an advanced feature for users who may be running their immich services on different hosts.
# It will not change which address or port that services bind to within their containers, but it will change where other services look for their peers.
# Note: immich-microservices is bound to 3002, but no references are made
####################################################################################

IMMICH_WEB_URL=http://immich-web:3000
IMMICH_SERVER_URL=http://immich-server:3001
IMMICH_MACHINE_LEARNING_URL=http://immich-machine-learning:3003

####################################################################################
# Alternative API's External Address - Optional
#
# This is an advanced feature used to control the public server endpoint returned to clients during Well-known discovery.
# You should only use this if you want mobile apps to access the immich API over a custom URL. Do not include trailing slash.
# NOTE: At this time, the web app will not be affected by this setting and will continue to use the relative path: /api
# Examples: http://localhost:3001, http://immich-api.example.com, etc
####################################################################################

#IMMICH_API_URL_EXTERNAL=http://localhost:3001

Reproduction steps

sudo docker-compose pull && sudo docker-compose up -d

Additional information

No response

bo0tzz commented 1 year ago

How much ram in MB/GB is it actually using? We can't do much with just a percentage.

weber8thomas commented 1 year ago

Same probleme here

Updated to 1.66.1 2 days ago

image

Here are my current docker stats

image

You can clearly see the difference since the last update

image
bo0tzz commented 1 year ago

@weber8thomas which version were you running before you updated?

rafsko1 commented 1 year ago

Now its consuming 17%, 1.25 GB / 7.63 GB but ive seen it consuming almost 6gb / 8gb

weber8thomas commented 1 year ago

@weber8thomas which version were you running before you updated?

I was running 1.65.0

bo0tzz commented 1 year ago

I don't think anything unusual is happening here - ML is expected to use quite a bit of RAM while processing is running, and after a bit of inactivity the models will be unloaded and the RAM usage will go down.

weber8thomas commented 1 year ago

I don't think anything unusual is happening here - ML is expected to use quite a bit of RAM while processing is running, and after a bit of inactivity the models will be unloaded and the RAM usage will go down.

The point is that no processes are running (CPU between 0 & 1% on docker stats) and no jobs listed on the admin dashboard of the WEB UI. So that's why this is unusual compare to the previous versions.

mertalev commented 1 year ago

Same probleme here

Updated to 1.66.1 2 days ago

image

Here are my current docker stats

image

You can clearly see the difference since the last update

image

Please make a new issue for this. While it's expected that ML will use a high amount of RAM, there are unusual spikes here. Also be sure to mention the version you were using before updating and to post the ML logs.

rafsko1 commented 1 year ago
Screenshot 2023-07-09 at 00 10 25

Exactly same story here

rafsko1 commented 1 year ago
Screenshot 2023-07-09 at 13 39 34

Now its 63%, 4.81 GB / 7.63 GB and doesn't look like that will drop.

Dodo55 commented 1 year ago

@bo0tzz

I don't think anything unusual is happening here - ML is expected to use quite a bit of RAM while processing is running, and after a bit of inactivity the models will be unloaded and the RAM usage will go down.

Sorry, but I can confirm that no unloading takes place within a reasonable timeframe and the RAM hogging / memleak goes further when another set of jobs run on a new occassion. (See attached image as proof)

Please investigate and fix this issue as soon as possible.

Meanwhile I'm thinking on creating a cronjob restarting the ML container every hour as a temporary workaround. Can it cause any trouble? Immich_memoryleak2

Immich_memoryleak

mertalev commented 1 year ago

Running a cronjob for it shouldn't cause an issue.

I think model unloading is causing a memory leak. The first time they're unloaded you can see a small decrease, but the next time RAM usage swells up further.

vikrant82 commented 1 year ago

I dont think this is fixed. I am on 1.71.0 and I am still seeing machine learning taking up around 1.6G of memory out of 8G.

image

rafsko1 commented 1 year ago

Same here. 27%, 2.05 GB / 7.63 GB

vikrant82 commented 1 year ago

Is there a way to disable machine learning. IMMICH_MACHINE_LEARNING_URL=false and removing machine learning container didnt seem to help as immich server kept crashing.

mertalev commented 1 year ago

That memory usage is completely normal.

Is there a way to disable machine learning. IMMICH_MACHINE_LEARNING_URL=false and removing machine learning container didnt seem to help as immich server kept crashing.

Could you share the logs for the server?

vikrant82 commented 1 year ago

Ok, I thought the container is supposed to unload the models once it is idle based on the discussion above. Is there a documentation on how to switch off machine learning.

Thanks..

mertalev commented 1 year ago

Model unloading is currently disabled by default since it can cause a memory leak.

As for disabling machine learning, the steps you mention are all that should be needed. If the server is crashing, I'd need to see the logs to help you.

Watever44 commented 1 year ago

Model unloading is currently disabled by default since it can cause a memory leak.

As for disabling machine learning, the steps you mention are all that should be needed. If the server is crashing, I'd need to see the logs to help you.

If the models are not unloading, can the model with machine learning mean it can keep increasing ? That's what I am seeing. I also see an increase at midnight, so I suppose that's when some jobs are run. Didn't find where that was set. Example : 20:45 -> 1.130 gig 23:59 -> 893.371 mib 00:08 -> 2.590 gig 01:47 -> 2.294 gig 11:10 -> 2.470 gig keep increasing 19:45 -> 2.705 gig reboot 19:54 -> 1.093 gig 20:02 -> 2.086 gig

I am not sure if it's the same issue with model loading or something else.

mertalev commented 1 year ago

Models are loaded on-demand now, so the container will have lower RAM usage until then. Models won't be unloaded after this by default, though. RAM usage can also vary based on the images sent and the number of concurrent requests.