milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31k stars 2.95k forks source link

[Bug]: NoSuchKey error in querynode after datacoord has deleted object #22866

Closed akevdmeer closed 1 year ago

akevdmeer commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version: 2.2.3
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2): n/a
- OS(Ubuntu or CentOS): Fedora CoreOS
- CPU/Memory: 48 Xeon cores, 768G
- GPU: no
- Others: n/a

Current Behavior

A querynode hits S3 NoSuchKey errors as it attempts to retrieve an S3 object that has been DELETEd by the datacoord.

We've hit this condition twice. Restarting querynodes and querycoord resolves the issue. That could be more restarting than necessary, we've not been able to pin that down since this occurs rarely.

Expected Behavior

Querynodes do not attempt to retrieve S3 objects that have been DELETEd by the datacoord.

Steps To Reproduce

We're so far unable to reproduce this.

Milvus Log

Here are excerpted logs that relate most directly to the phenomenon. They reflect the creation of the object, to the deletion and subsequent attempts to still retrieve it. Please advise what further diagnostic information would be useful.

milvus-nosuchkey-excerpt.log

(Timestamps in the attached logs are partly UTC partly local time, the timestamps on separate lines with '@' are consistently our local time.)

Anything else?

No response

yanliang567 commented 1 year ago

/assign @soothing-rain /unassign

soothing-rain commented 1 year ago

@akevdmeer Hi there! Can you provide us with the full log? Can you also provide us with a backup of your Etcd with our https://github.com/milvus-io/birdwatcher tool please?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.