Open prodriguezu opened 5 days ago
@prodriguezu thank you for the detailed info of the issue. quick questions:
/assign @aoiasd
please help to take a look, does it mean the MQ did not persist the delete request?
Thanks @yanliang567 for the quick answer.
how do you deploy and restart milvus-standalone container?
This is the configuration of the docker compose file with all components related with milvus:
milvus-standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.4.15
command: [ "milvus", "run", "standalone" ]
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
ports:
- "19530:19530"
- "9091:9091"
depends_on:
- etcd
- minio
healthcheck:
test: [ "CMD", "curl", "-f", "http://milvus-standalone:9091/healthz" ]
interval: 30s
start_period: 90s
timeout: 20s
retries: 3
deploy:
resources:
limits:
memory: 8G
networks:
- vcspnet
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.0
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
networks:
- vcspnet
minio:
container_name: milvus-minio
image: minio/minio:RELEASE.2024-05-10T01-41-38Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
command: minio server /minio_data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
networks:
- vcspnet
attu:
container_name: attu
image: zilliz/attu:v2.3.10
ports:
- "3001:3000"
environment:
- MILVUS_URL=milvus-standalone:19530
restart: always
depends_on:
- milvus-standalone
networks:
- vcspnet
I just restart the milvus-standalone docker with docker command : docker restart milvus-standalone
.
I must say that this occurred also in a cluster k8s deployment and I just reproduced it locally with standalone deployment by following the described steps.
what about if doing a flush after restart the milvus? does it reproduce the issue?
After restarting milvus container, if I apply a flush over the collection indeed, the deleted vector disappears from the collection (the expected behavior). But, does it means that if a restart of a container or a pod in k8s we do need to flush all the collections programatically?
Thanks again for your support!
/assign @congqixia can you help on this? we need to add more unit test case
@prodriguezu The issue here is that the MQ milvus standalone is using is rockmq, which stores log file in "local disk". So the rockmq file must be mapped to local path or volume path in case of lost when restart pods. May I ask where did you get this docker-compose yaml file from?
you could check this example yaml https://github.com/milvus-io/milvus/releases/download/v2.4.15/milvus-standalone-docker-compose.yml to config docker pod persistence to avoid lost WAL data
Thanks @congqixia for your answer!
May I ask where did you get this docker-compose yaml file from?
I guess that I took it from documentation but maybe I modify it. I'm sorry that I can not be more accurate but I did this long time ago.
you could check this example yaml https://github.com/milvus-io/milvus/releases/download/v2.4.15/milvus-standalone-docker-compose.yml to config docker pod persistence to avoid lost WAL data
I've added the path volume mapping within the container configuration and still if I restart the standalone container after a deletion and an automatic flush, the removed vector appears again.
@prodriguezu thanks for the reply!
I've added the path volume mapping within the container configuration and still if I restart the standalone container after a deletion and an automatic flush, the removed vector appears again.
Yes. I do reproduced with the step you described. After some digging, it's caused by a known issue that some delta data will be missing for growing segment, which is fixed by #37599
v2.4.16 will be release in next few days. it shall fix this problem and I will reproduce it using latest build
I guess that I took it from documentation but maybe I modify it. I'm sorry that I can not be more accurate but I did this long time ago.
It's totally fine. Just wanna make sure that there is no misleading example yaml in our documents
Thanks @congqixia for the great and kind support!
I'll wait then for next version to be released.
Kind regards!
Thanks @congqixia for your response!
I've seen that v2.4.16 has been released. I've tried it locally and the problem persists... Maybe I should have waited a little bit since there is not release notes actually.
Anyway, I add more comments and information that may be helpful for you.
I attach here some screenshots of retrieved milvus logs with: docker logs milvus-standalone -f | grep delta
Could it be the last log line somehow a lead about the problem causing this misbehavior?
Thank you very much in advance!
/assign
Is there an existing issue for this?
Environment
Current Behavior
When I delete a vector entity and an internal milvus process seals a segment and store the bin logs in object storage (minio) if I restart the milvus standalone container. The deleted vector appears again in the collection.
Next, I describe the steps follow. I'm able to reproduce it using Attu interface and using pymilvus client. I'll show the example using Attu so as the process can be easily follow.
1) Adding two vectors to collection.
2) Remove one of them
3) Wait for normal functioning. Internal milvus task generate bin logs in minio. I've inspected the content of the delta log folder and the pk identifier of the deleted vector.
4) Restarting the milvus standalone container and inspecting the data within the collection gives the two original vectors including the one removed.
If after deletion I do apply a flush operation this behavior is not happening. Removed vectors do not appear again after restarting the containers.I must say that after flushing manually the segments partition id identifiers differ from when the flush is automatic. Check this image out.
Expected Behavior
I'd have expected that deleting a vector will be consistent at the event of restarting milvus standalone container. From documentation it not easy to extract whether we must perform a flush operation after deleting entities. In fact, I guess that sealing segments too often could lead to performance degradation if many low size segments are created.
Steps To Reproduce
Milvus Log
No response
Anything else?
I've been able to keep a vector gone from the system in a consistent manner if I apply a flush of the collection after deleting a vector. Is this something expected? I mean, it is mandatory to flush to permanently delete vector entities?
Many thanks in advance!