milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
28.4k stars 2.73k forks source link

[Bug]: Milvus standalone container crashes automatically #33782

Open hoangph3 opened 1 month ago

hoangph3 commented 1 month ago

Is there an existing issue for this?

Environment

- Milvus version: 2.2.12
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): no
- SDK version(e.g. pymilvus v2.0.0rc2): 2.2.15
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 24 cores, 64GB
- GPU: No
- Others:

Current Behavior

When i run docker-compose up -d with compose file:

version: "3.0"

services:
  etcd:
    container_name: milvus_etcd
    image: hoangph3/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - /opt/db/milvus/etcd:/etcd
    command: etcd -advertise-client-urls=http:/127.0.0.1:2379 -listen-client-urls http:/0.0.0.0:2379 --data-dir /etcd
    restart: always
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    container_name: milvus_minio
    image: hoangph3/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    volumes:
      - /opt/db/milvus/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    restart: always
    healthcheck:
      test: ["CMD", "curl", "-f", "http:/localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  milvus:
    image: hoangph3/milvus:v2.2.12
    container_name: milvus_standalone
    command: ["milvus", "run", "standalone"]
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - /opt/db/milvus:/var/lib/milvus
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"
    restart: always
    healthcheck:
      test: ["CMD", "curl", "-f", "http:/localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3

After some seconds, the container crashes automatically and restarting, because i set restart: always. I am try on other machine, it's working. What's going on? I have attached the log image to this issue below.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

Restarting: image

Milvus logs before it crashes: Screenshot from 2024-06-12 13-26-57 Screenshot from 2024-06-12 10-27-29

Anything else?

No response

yanliang567 commented 1 month ago

@hoangph3 could you please retry the issue on latest release v2.4.4? If it reproduced to you, please offer milvus logs for investigation. For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.

/assign @hoangph3 /unassign

xiaofan-luan commented 1 month ago

what is the reason of start? and fatal/error logs or K8s info?

hoangph3 commented 1 month ago

@yanliang567 It isn't easy to do that because this is a standalone product we deploy on top of our customer infrastructure. To update the Milvus version, we need to create a change request with multiple steps of signing and approving the request. It's not a big deal; it only consumes time waiting for customer approval. But, can you ensure this problem does not happen in the future after we update the Milvus version to 2.4.4?

@xiaofan-luan I don't know, I use docker-compose only, not k8s. But when I try it on other machines, it's working. Are there infrastructure factors that could affect Milvus?

yanliang567 commented 1 month ago

please offer the full milvus logs as comments above. @hoangph3

xiaofan-luan commented 1 month ago

@yanliang567 It isn't easy to do that because this is a standalone product we deploy on top of our customer infrastructure. To update the Milvus version, we need to create a change request with multiple steps of signing and approving the request. It's not a big deal; it only consumes time waiting for customer approval. But, can you ensure this problem does not happen in the future after we update the Milvus version to 2.4.4?

@xiaofan-luan I don't know, I use docker-compose only, not k8s. But when I try it on other machines, it's working. Are there infrastructure factors that could affect Milvus?

without enough information, It might be hard for us to give a clue. can you collect logs?

hoangph3 commented 5 days ago

Yes, please help me to fix it. This problem is recurring on my production environment and currently I cannot restart the service. milvus_log.zip Note that the milvus version is v2.2.12. @yanliang567 @xiaofan-luan