milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.9k stars 2.95k forks source link

[Bug]: A deleted vector appears again when restarting milvus standalone container #37782

Open prodriguezu opened 5 days ago

prodriguezu commented 5 days ago

Is there an existing issue for this?

Environment

- Milvus version: 2.4.15
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): N/A
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

When I delete a vector entity and an internal milvus process seals a segment and store the bin logs in object storage (minio) if I restart the milvus standalone container. The deleted vector appears again in the collection.

Next, I describe the steps follow. I'm able to reproduce it using Attu interface and using pymilvus client. I'll show the example using Attu so as the process can be easily follow.

1) Adding two vectors to collection.

Screenshot from 2024-11-18 12-09-52

2) Remove one of them

3) Wait for normal functioning. Internal milvus task generate bin logs in minio. I've inspected the content of the delta log folder and the pk identifier of the deleted vector.

4) Restarting the milvus standalone container and inspecting the data within the collection gives the two original vectors including the one removed.

If after deletion I do apply a flush operation this behavior is not happening. Removed vectors do not appear again after restarting the containers.I must say that after flushing manually the segments partition id identifiers differ from when the flush is automatic. Check this image out.

Screenshot from 2024-11-18 12-31-11

Expected Behavior

I'd have expected that deleting a vector will be consistent at the event of restarting milvus standalone container. From documentation it not easy to extract whether we must perform a flush operation after deleting entities. In fact, I guess that sealing segments too often could lead to performance degradation if many low size segments are created.

Steps To Reproduce

1. Create a collection and add two entities.
2. Remove one of them.
3. Wait milvus for internal operation to store logs in object storage.
4. Restart milvus-standalone container.

Milvus Log

No response

Anything else?

I've been able to keep a vector gone from the system in a consistent manner if I apply a flush of the collection after deleting a vector. Is this something expected? I mean, it is mandatory to flush to permanently delete vector entities?

Many thanks in advance!

yanliang567 commented 5 days ago

@prodriguezu thank you for the detailed info of the issue. quick questions:

  1. how do you deploy and restart milvus-standalone container?
  2. what about if doing a flush after restart the milvus? does it reproduce the issue? /assign @prodriguezu

/assign @aoiasd
please help to take a look, does it mean the MQ did not persist the delete request?

prodriguezu commented 5 days ago

Thanks @yanliang567 for the quick answer.

how do you deploy and restart milvus-standalone container?

This is the configuration of the docker compose file with all components related with milvus:

  milvus-standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.4.15
    command: [ "milvus", "run", "standalone" ]
    environment:
        ETCD_ENDPOINTS: etcd:2379
        MINIO_ADDRESS: minio:9000
    ports:
        - "19530:19530"
        - "9091:9091"
    depends_on:
        - etcd
        - minio
    healthcheck:
        test: [ "CMD", "curl", "-f", "http://milvus-standalone:9091/healthz" ]
        interval: 30s
        start_period: 90s
        timeout: 20s
        retries: 3
    deploy:
        resources:
            limits:
                memory: 8G
    networks:
        - vcspnet
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.0
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    networks:
        - vcspnet
  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2024-05-10T01-41-38Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    command: minio server /minio_data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3
    networks:
        - vcspnet
  attu:
    container_name: attu
    image: zilliz/attu:v2.3.10
    ports:
        - "3001:3000"
    environment:
        - MILVUS_URL=milvus-standalone:19530
    restart: always
    depends_on:
        - milvus-standalone
    networks:
        - vcspnet

I just restart the milvus-standalone docker with docker command : docker restart milvus-standalone.

I must say that this occurred also in a cluster k8s deployment and I just reproduced it locally with standalone deployment by following the described steps.

what about if doing a flush after restart the milvus? does it reproduce the issue?

After restarting milvus container, if I apply a flush over the collection indeed, the deleted vector disappears from the collection (the expected behavior). But, does it means that if a restart of a container or a pod in k8s we do need to flush all the collections programatically?

Thanks again for your support!

xiaofan-luan commented 5 days ago

/assign @congqixia can you help on this? we need to add more unit test case

congqixia commented 4 days ago

@prodriguezu The issue here is that the MQ milvus standalone is using is rockmq, which stores log file in "local disk". So the rockmq file must be mapped to local path or volume path in case of lost when restart pods. May I ask where did you get this docker-compose yaml file from?

congqixia commented 4 days ago

you could check this example yaml https://github.com/milvus-io/milvus/releases/download/v2.4.15/milvus-standalone-docker-compose.yml to config docker pod persistence to avoid lost WAL data

prodriguezu commented 4 days ago

Thanks @congqixia for your answer!

May I ask where did you get this docker-compose yaml file from?

I guess that I took it from documentation but maybe I modify it. I'm sorry that I can not be more accurate but I did this long time ago.

you could check this example yaml https://github.com/milvus-io/milvus/releases/download/v2.4.15/milvus-standalone-docker-compose.yml to config docker pod persistence to avoid lost WAL data

I've added the path volume mapping within the container configuration and still if I restart the standalone container after a deletion and an automatic flush, the removed vector appears again.

congqixia commented 4 days ago

@prodriguezu thanks for the reply!

I've added the path volume mapping within the container configuration and still if I restart the standalone container after a deletion and an automatic flush, the removed vector appears again.

Yes. I do reproduced with the step you described. After some digging, it's caused by a known issue that some delta data will be missing for growing segment, which is fixed by #37599

v2.4.16 will be release in next few days. it shall fix this problem and I will reproduce it using latest build

I guess that I took it from documentation but maybe I modify it. I'm sorry that I can not be more accurate but I did this long time ago.

It's totally fine. Just wanna make sure that there is no misleading example yaml in our documents

prodriguezu commented 4 days ago

Thanks @congqixia for the great and kind support!

I'll wait then for next version to be released.

Kind regards!

prodriguezu commented 2 days ago

Thanks @congqixia for your response!

I've seen that v2.4.16 has been released. I've tried it locally and the problem persists... Maybe I should have waited a little bit since there is not release notes actually.

Anyway, I add more comments and information that may be helpful for you.

  1. I've map the minio, etcd and milvus volumes to local disk as you recommended here.
  2. The misbehavior is happening when after deleting a vector I wait for milvus data coordinator to do the "SaveBinlogPaths sync segment with meta" operation and then restart.
  3. The behavior is the one expected if:
    • restart of the container is done before the automatic operation.
    • a manual flush is done (via milvusclient or directly on attu interface)

I attach here some screenshots of retrieved milvus logs with: docker logs milvus-standalone -f | grep delta

  1. Instant when data coordinator automatically does stuff Screenshot from 2024-11-21 10-50-13
  2. Right after the restart when the deleted vector reappears Screenshot from 2024-11-21 10-51-33

Could it be the last log line somehow a lead about the problem causing this misbehavior?

Thank you very much in advance!

congqixia commented 2 days ago

/assign