milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.09k stars 2.95k forks source link

[Bug]: example insert error when milvus deploy in jingdong cloud #12072

Closed leadorzf closed 2 years ago

leadorzf commented 3 years ago

Is there an existing issue for this?

Environment

- Milvus version: 2.0.0rc6-2.0.0rc8
- Deployment mode(standalone or cluster):standalone
- SDK version(e.g. pymilvus v2.0.0rc2):v2.0.0rc6-v2.0.0rc8
- OS(Ubuntu or CentOS): ubuntu 18.04LTS
- CPU/Memory:  Core(TM) i7-8700K CPU @ 3.70GHz/32GB
- GPU:  NVIDIA Corporation GP104 [GeForce GTX 1080]
- Others:

Current Behavior

run example.py in local, _HOST change to server ip, standalone server running in jd-cloud server when runing step "insert", throw error as follows:

_Milvus python sdk version:2.0.0rc8 Create connection... List connections: [('default', <pymilvus.client.stub.Milvus object at 0x7fb7a770aa90>)] Drop collection: demo collection created: demo list collections: ['demo'] Addr [114.67.225.7:19530] bulk_insert RPC error: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "Connection timed out" debug_error_string = "{"created":"@1637217820.605611946","description":"Error received from peer ipv4:114.67.225.7:19530","file":"src/core/lib/surface/call.cc","file_line":1068,"grpc_message":"Connection timed out","grpc_status":14}"

{'API start': '2021-11-18 14:43:01.349244', 'RPC start': '2021-11-18 14:43:01.350342', 'RPC error': '2021-11-18 14:43:40.606280'} _

Expected Behavior

when milvus running in server or milvus deploy in local, example runs correctly. _(base) root@kubernetes-master:~/milvus2.0/docker# python example.py Milvus python sdk version:2.0.0rc8

Create connection...

List connections: [('default', <pymilvus.client.stub.Milvus object at 0x7fdbd0f8a850>)]

collection created: demo

list collections: ['demo']

The number of entity: 10000

Created index: {'index_type': 'IVF_FLAT', 'params': {'nlist': 1024}, 'metric_type': 'L2'}

Search result for 0th vector: Top 0: (distance: 14.90288257598877, id: 9545) Top 1: (distance: 15.05495834350586, id: 3128) Top 2: (distance: 15.076667785644531, id: 9362)

Search result for 1th vector: Top 0: (distance: 0.0, id: 1) Top 1: (distance: 13.708738327026367, id: 775) Top 2: (distance: 14.773910522460938, id: 3570)

Search result for 2th vector: Top 0: (distance: 0.0, id: 2) Top 1: (distance: 13.089069366455078, id: 6154) Top 2: (distance: 13.24781322479248, id: 1015)

Drop index sucessfully

Drop collection: demo _

Steps To Reproduce

1. server command: docker-compose -f milvus/deployments/docker/standalone/docker-compose.yml up -d
2. local change _HOST to server's ip
3. local command: python example.py

Anything else?

No response

yanliang567 commented 3 years ago

@leadorzf how many entities did you insert in one batch? how is the latency between the client and cloud, any firewall configured? I am asking because the error indicates it could be a rpc timeout issue

yanliang567 commented 3 years ago

/assign @leadorzf /unassign

leadorzf commented 3 years ago

@leadorzf how many entities did you insert in one batch? how is the latency between the client and cloud, any firewall configured? I am asking because the error indicates it could be a rpc timeout issue I have tested 10 and 1000 batches in examply.py, same error happened. _DIM = 128 vectors = insert(collection, 10, _DIM) vectors = insert(collection, 1000, _DIM) I also tested the connection of server telnet 114.67.225.7 1930 it returns Trying 114.67.225.7... Connected to 114.67.225.7. Escape character is '^]'. So I think the connection is ok.

but I have no idea about the latency and firewall, have any test command can I to figure it out ?

leadorzf commented 3 years ago

@leadorzf how many entities did you insert in one batch? how is the latency between the client and cloud, any firewall configured? I am asking because the error indicates it could be a rpc timeout issue I have tested 10 and 1000 batches in examply.py, same error happened. _DIM = 128 vectors = insert(collection, 10, _DIM) vectors = insert(collection, 1000, _DIM) I also tested the connection of server telnet 114.67.225.7 1930 it returns Trying 114.67.225.7... Connected to 114.67.225.7. Escape character is '^]'. So I think the connection is ok.

but I have no idea about the latency and firewall, have any test command can I do to figure it out ?

yanliang567 commented 3 years ago

@leadorzf could you please upload the mivlus logs for further investigation?

leadorzf commented 3 years ago

[2021/11/19 12:38:41.414 +00:00] [DEBUG] [impl.go:2313] [RegisterLink] [role=proxy] ["state code of proxy"=Healthy] [2021/11/19 12:38:42.455 +00:00] [DEBUG] [impl.go:399] ["DescribeCollection enqueue"] [role=proxy] [db=] [collection=demo] [2021/11/19 12:38:42.455 +00:00] [DEBUG] [id.go:141] ["IDAllocator pickCanDoFunc"] [need=1] [total=189944] [remainReqCnt=0] [2021/11/19 12:38:42.455 +00:00] [DEBUG] [impl.go:413] [DescribeCollection] [role=proxy] [msgID=429215041160226633] [timestamp=429215061751627780] [db=] [collection=demo] [2021/11/19 12:38:42.455 +00:00] [DEBUG] [root_coord.go:1303] [DescribeCollection] [name=demo] [msgID=429215041160226633] [2021/11/19 12:38:42.456 +00:00] [DEBUG] [root_coord.go:1323] ["DescribeCollection Succeeded"] [name=demo] [msgID=429215041160226633] [2021/11/19 12:38:42.456 +00:00] [DEBUG] [impl.go:420] ["DescribeCollection Done"] [] [role=proxy] [msgID=429215041160226633] [timestamp=429215061751627780] [db=] [collection=demo] [2021/11/19 12:38:44.403 +00:00] [DEBUG] [session_util.go:270] ["SessionUtil GetSessions"] [prefix=IndexNode] [resp="{\"header\":{\"cluster_id\":14841639068965178418,\"member_id\":10276657743932975437,\"revision\":29287,\"raft_term\":4},\"kvs\":[{\"key\":\"YnktZGV2L21ldGEvc2Vzc2lvbi9JbmRleE5vZGUtMjE=\",\"create_revision\":29073,\"mod_revision\":29073,\"version\":1,\"value\":\"eyJTZXJ2ZXJJRCI6MjEsIlNlcnZlck5hbWUiOiJJbmRleE5vZGUiLCJBZGRyZXNzIjoiMTcyLjE5LjAuNDoyMTEyMSJ9\",\"lease\":7587858627509139752}],\"count\":1}"] [2021/11/19 12:38:47.403 +00:00] [DEBUG] [session_util.go:270] ["SessionUtil GetSessions"] [prefix=IndexNode] [resp="{\"header\":{\"cluster_id\":14841639068965178418,\"member_id\":10276657743932975437,\"revision\":29290,\"raft_term\":4},\"kvs\":[{\"key\":\"YnktZGV2L21ldGEvc2Vzc2lvbi9JbmRleE5vZGUtMjE=\",\"create_revision\":29073,\"mod_revision\":29073,\"version\":1,\"value\":\"eyJTZXJ2ZXJJRCI6MjEsIlNlcnZlck5hbWUiOiJJbmRleE5vZGUiLCJBZGRyZXNzIjoiMTcyLjE5LjAuNDoyMTEyMSJ9\",\"lease\":7587858627509139752}],\"count\":1}"]

leadorzf commented 3 years ago

I also find some warnings as follows: [2021/11/19 13:03:31.680 +00:00] [WARN] [grpclog.go:46] ["[transport]transport: http2Server.HandleStreams failed to read frame: read tcp 172.19.0.4:19530->58.33.101.126:48538: read: connection reset by peer"]

xiaofan-luan commented 3 years ago

try ping 172.19.0.4:19530 and see if you have network access to milvus cluster. guess 58.33.101.126 is your client's ip

leadorzf commented 3 years ago

you mean the milvus server need commicate to milvus client when client insert data to server?

xiaofan-luan commented 3 years ago

you mean the milvus server need commicate to milvus client when client insert data to server?

the client need to connect to server though, you can try to ping from client to server see if it works

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.