milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.74k stars 2.85k forks source link

[Bug]: [GOSDK] No error is returned when inserting rows with extra fields than the schema #33487

Open ThreadDao opened 4 months ago

ThreadDao commented 4 months ago

Is there an existing issue for this?

Environment

- Milvus version: master-20240528-b138ae74-amd64
- Deployment mode(standalone or cluster): 
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): go-sdk v2 (2.4.0-dev)
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

  1. create collection with 2 fields: int64 + floatVec
  2. insert row succ with int64, int32, floatVec fields
    [2024/05/30 19:44:44.968 +08:00] [DEBUG] [testcases/insert_test.go:267] ["Row data"] ["row[0]"="{\"int32\":1,\"int64\":1,\"floatVec\":[0.7953272,0.97875476,0.71185696,0.80443656,0.6120793,0.9971814,0.6910342,0.60436314]}"]
  3. If inserting column data with extra fields will return error: field int32 does not exist in collection xxx

Expected Behavior

same behavior between row and column !

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

yanliang567 commented 4 months ago

/unassign

please note that go sdk v2 commits should go to 2.4 branch as well

congqixia commented 4 months ago

@ThreadDao row based API will only fetch field data exists in schema. It looks like this behavior does not work for dynamic schema. Will fix that

xiaofan-luan commented 4 months ago

We need to discuss a meta cache policy. This will be very important for all SDKs, especially handle schema free collections.

  1. All the sdk cache the schema by collection name -> collection meta
  2. all the dml request carry a collection timestamp
  3. if timestamp smaller than proxy cache, server should throw an error and client refresh the cache
xiaofan-luan commented 4 months ago

@czs007 will be working on this. Before we have this, we can remove all the client side check and serialize all data into the request. the server side check the field and map it to the right position