milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.54k stars 2.83k forks source link

[Bug]: The inserted primary key is different from the returned result #20415

Closed larry88 closed 6 months ago

larry88 commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:2.1.4
- Deployment mode(standalone or cluster): k8s
- SDK version(e.g. pymilvus v2.0.0rc2): restapi
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 96c/512G
- GPU: no
- Others:

Current Behavior

when I insert data, the key number is less than 64 bits, but it will overflow

4B77A642-729E-477B-9677-2727F4154E43 DF8BB737-9E60-4E5D-A563-4499AF19FE01

Insert number: 9999999999999999 Max Int64: 9223372036854775807

Expected Behavior

The inserted primary key number is equal with the returned result

Steps To Reproduce

1. excute command:
 curl -X 'POST' 'http://172.19.80.95:31601/api/v1/entities'  -H 'accept: application/json' -H 'Content-Type: application/json' \
-d '{
"collection_name": "seller_tag",
"fields_data": [{ "field_name":"kid","type":5,"field":[9999999999999999] },{ "field_name":"seller_id","type":5,"field":[5625300123280813090] },{ "field_name":"vector","type":101,"field":[[0.08090433, 0.19154754, 0.16858263, 0.027101958, 0.07229418, 0.15223257, -0.024227709, 0.13302892, 0.05951315, 0.03572949, -0.015721956, -0.21992287, 0.08134472, 0.18640009, -0.09814235, -0.11117617, 0.10464557, -0.092037976, -0.19489805, -0.069008306, -0.039415136, -0.17841195, 0.076126315, 0.031378396, 0.22680397, 0.045089707, 0.12307317, 0.06711619, 0.15067382, -0.213569, 0.066602595, -0.021743167, -0.2727193, -0.112709574, 0.09504322, 0.02386695, 0.04574049, -0.055642836, -0.16812482, -0.051256, -0.11399734, 0.29519975, 0.109542266, 0.18452083, 0.05543076, -0.064969495, -0.14457555, -0.034600936, 0.045484997, -0.15677887, -0.12983392, 0.20921704, -0.049788076, 0.050687622, -0.23369887, -0.022488454, 0.06089106, 0.14699098, -0.08140416, -0.008949298, -0.14867777, 0.07415456, -0.0027948048, 0.0060837376]] }],
"num_rows": 1
}'

2. response is :
{"status":{},"IDs":{"IdField":{"IntId":{"data":[10000000000000000]}}},"succ_index":[0],"insert_cnt":1,"timestamp":437233704314077186}

Milvus Log

no

Anything else?

no

yanliang567 commented 1 year ago

/assign @binbinlv please help to try with pymilvus to check whether it is a sdk issue or not?

/unassign

binbinlv commented 1 year ago

/assign @binbinlv please help to try with pymilvus to check whether it is a sdk issue or not?

/unassign

Ok, working.

binbinlv commented 1 year ago

Using pymilvus 2.1.3, milvus 2.1.4, the result is right:

>>>
>>> from pymilvus import CollectionSchema, FieldSchema

>>> from pymilvus import Collection
>>> from pymilvus import connections
>>> from pymilvus import DataType
>>> from pymilvus import Partition
>>> from pymilvus import utility
>>> connections.connect()
>>> int64_field = FieldSchema(name="int64", dtype=DataType.INT64, is_primary=True)
>>> int64_field_1 = FieldSchema(name="int641", dtype=DataType.INT64)
>>> dim=64
>>>
>>> float_vector = FieldSchema(name="float_vector", dtype=DataType.FLOAT_VECTOR, dim=dim)
>>> schema = CollectionSchema(fields=[int64_field, int64_field_1, float_vector])
>>> collection = Collection("test_search_collection_binbin_0", schema=schema)
>>> import numpy as np
>>> res = collection.insert([[9999999999999999], [5625300123280813090], [[0.08090433, 0.19154754, 0.16858263, 0.027101958, 0.07229418, 0.15223257, -0.024227709, 0.13302892, 0.05951315, 0.03572949, -0.015721956, -0.21992287, 0.08134472, 0.18640009, -0.09814235, -0.11117617, 0.10464557, -0.092037976, -0.19489805, -0.069008306, -0.039415136, -0.17841195, 0.076126315, 0.031378396, 0.22680397, 0.045089707, 0.12307317, 0.06711619, 0.15067382, -0.213569, 0.066602595, -0.021743167, -0.2727193, -0.112709574, 0.09504322, 0.02386695, 0.04574049, -0.055642836, -0.16812482, -0.051256, -0.11399734, 0.29519975, 0.109542266, 0.18452083, 0.05543076, -0.064969495, -0.14457555, -0.034600936, 0.045484997, -0.15677887, -0.12983392, 0.20921704, -0.049788076, 0.050687622, -0.23369887, -0.022488454, 0.06089106, 0.14699098, -0.08140416, -0.008949298, -0.14867777, 0.07415456, -0.0027948048, 0.0060837376]] ])
>>>
>>>
>>> res
(insert count: 1, delete count: 0, upsert count: 0, timestamp: 437246367618301954, success count: 1, err count: 0)
>>> res.primary_keys
[9999999999999999]
binbinlv commented 1 year ago

Using milvus 2.1.4, restAPI, this issue is reproduced: id from 9999999999999999 to 10000000000000000

  1. connect

    curl ***:9091/api/v1/health
    {"status":"ok"}
  2. create collection:

    curl -X 'POST' \
    'http://***:9091/api/v1/collection' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
    "collection_name": "seller_tag",
    "schema": {
      "autoID": false,
      "description": "Test book search",
      "fields": [
        {
          "name": "kid",
          "description": "book id",
          "is_primary_key": true,
          "autoID": false,
          "data_type": 5
        },
        {
          "name": "seller_id",
          "description": "count of words",
          "is_primary_key": false,
          "data_type": 5
        },
        {
          "name": "vector",
          "description": "embedded vector of book introduction",
          "data_type": 101,
          "is_primary_key": false,
          "type_params": [
            {
              "key": "dim",
              "value": "64"
            }
          ]
        }
      ],
      "name": "seller_tag"
    }
    }'
    {}
  3. insert:

curl -X 'POST' 'http://***:9091/api/v1/entities'  -H 'accept: application/json' -H 'Content-Type: application/json' \
-d '{
"collection_name": "seller_tag",
"fields_data": [{ "field_name":"kid","type":5,"field":[9999999999999999] },{ "field_name":"seller_id","type":5,"field":[5625300123280813090] },{ "field_name":"vector","type":101,"field":[[0.08090433, 0.19154754, 0.16858263, 0.027101958, 0.07229418, 0.15223257, -0.024227709, 0.13302892, 0.05951315, 0.03572949, -0.015721956, -0.21992287, 0.08134472, 0.18640009, -0.09814235, -0.11117617, 0.10464557, -0.092037976, -0.19489805, -0.069008306, -0.039415136, -0.17841195, 0.076126315, 0.031378396, 0.22680397, 0.045089707, 0.12307317, 0.06711619, 0.15067382, -0.213569, 0.066602595, -0.021743167, -0.2727193, -0.112709574, 0.09504322, 0.02386695, 0.04574049, -0.055642836, -0.16812482, -0.051256, -0.11399734, 0.29519975, 0.109542266, 0.18452083, 0.05543076, -0.064969495, -0.14457555, -0.034600936, 0.045484997, -0.15677887, -0.12983392, 0.20921704, -0.049788076, 0.050687622, -0.23369887, -0.022488454, 0.06089106, 0.14699098, -0.08140416, -0.008949298, -0.14867777, 0.07415456, -0.0027948048, 0.0060837376]] }],
"num_rows": 1
}'
{"status":{},"IDs":{"IdField":{"IntId":{"data":[10000000000000000]}}},"succ_index":[0],"insert_cnt":1,"timestamp":437249762235777026}
binbinlv commented 1 year ago

Reproduced only in restAPI, so maybe the sdk issue, @haorenfsa could you help to have a look please? Thanks.

binbinlv commented 1 year ago

/assign @haorenfsa /unassign

haorenfsa commented 1 year ago

It's a bug due to careless golang interface conversion in parsing request. It's not an overflow but a precision lost when converting from float64 to int64.

https://github.com/milvus-io/milvus/blob/master/internal/distributed/proxy/httpserver/wrap_request.go#L133

here is a simple proof test of it:

func Test_convert(t *testing.T) {
    type val struct {
        A interface{} `json:"a"`
    }
    raw := `{"a": 9999999999999999}`
    var v val
    err := json.Unmarshal([]byte(raw), &v)
    assert.Nil(t, err)
    assert.Equal(t, int64(9999999999999999), int64(v.A.(float64)))
}

which outputs

        Error:          Not equal: 
                            expected: 9999999999999999
                            actual  : 10000000000000000
        Test:           Test_convert
haorenfsa commented 1 year ago

Same problem may also apply to other sdks, whose language uses float64 for number types. eg: node sdk with typescript. Can you help to test & prove it?

haorenfsa commented 1 year ago

Same problem may also apply to other sdks, whose language uses float64 for number types. eg: node sdk with typescript. Can you help to test & prove it?

16951

xiaofan-luan commented 1 year ago
Unmarshal

So json unmarshal treat all data types as float?

haorenfsa commented 1 year ago

So json unmarshal treat all data types as float?

all number data, yes, as well as many other major json libs

haorenfsa commented 1 year ago

FYI It's stated in the RFC standard of JSON: https://datatracker.ietf.org/doc/html/rfc8259

This specification allows implementations to set limits on the range
   and precision of numbers accepted.  Since software that implements
   IEEE 754 binary64 (double precision) numbers [[IEEE754](https://datatracker.ietf.org/doc/html/rfc8259#ref-IEEE754)] is generally
   available and widely used, good interoperability can be achieved by
   implementations that expect no more precision or range than these
   provide, in the sense that implementations will approximate JSON
   numbers within the expected precision.  A JSON number such as 1E400
   or 3.141592653589793238462643383279 may indicate potential
   interoperability problems, since it suggests that the software that
   created it expects receiving software to have greater capabilities
   for numeric magnitude and precision than is widely available.

   Note that when such software is used, numbers that are integers and
   are in the range [-(2**53)+1, (2**53)-1] are interoperable in the
   sense that implementations will agree exactly on their numeric
   values.
haorenfsa commented 1 year ago

@larry88 thanks again for this feedback. It really saved us from many troubles to come.

binbinlv commented 1 year ago

Same problem may also apply to other sdks, whose language uses float64 for number types. eg: node sdk with typescript. Can you help to test & prove it?

@nameczz Yes, same problem using Node SDK. insert: (id) 9999999999999999 return: (id) 10000000000000000 node.js:

const {MilvusClient} = require("@zilliz/milvus2-sdk-node")
const address = "***:19530";
const milvusClient = new MilvusClient(address);
const params = {
  collection_name: "book",
  description: "Test book search",
  fields: [
    {
      name: "book_id",
      data_type: 5,   //DataType.Int64
      is_primary_key: true,
      description: "",
    },
    {
      name: "book_id_1",
      data_type: 5,   //DataType.Int64
      description: "",
    },
    {
      name: "book_intro",
      description: "",
      data_type: 101,  // DataType.FloatVector
      type_params: {
        dim: "64",
      },
    },
  ],
};
(async () => {
   await milvusClient.collectionManager.createCollection(params);

   const data = [{"book_id":9999999999999999,"book_id_1":5625300123280813090, "book_intro":[0.08090433, 0.19154754, 0.16858263, 0.027101958, 0.07229418, 0.15223257, -0.024227709, 0.13302892, 0.05951315, 0.03572949, -0.015721956, -0.21992287, 0.08134472, 0.18640009, -0.09814235, -0.11117617, 0.10464557, -0.092037976, -0.19489805, -0.069008306, -0.039415136, -0.17841195, 0.076126315, 0.031378396, 0.22680397, 0.045089707, 0.12307317, 0.06711619, 0.15067382, -0.213569, 0.066602595, -0.021743167, -0.2727193, -0.112709574, 0.09504322, 0.02386695, 0.04574049, -0.055642836, -0.16812482, -0.051256, -0.11399734, 0.29519975, 0.109542266, 0.18452083, 0.05543076, -0.064969495, -0.14457555, -0.034600936, 0.045484997, -0.15677887, -0.12983392, 0.20921704, -0.049788076, 0.050687622, -0.23369887, -0.022488454, 0.06089106, 0.14699098, -0.08140416, -0.008949298, -0.14867777, 0.07415456, -0.0027948048, 0.0060837376]}]

   const mq = await milvusClient.dataManager.insert({
      collection_name: "book",
      fields_data: data,
   });
   console.log(mq);
   await milvusClient.collectionManager.dropCollection({  collection_name: "book",});
   console.log("Dropped collction");
   console.log(mq.IDs.int_id.data[0]);

})();
yhmo commented 1 year ago

Bulkinsert also has the same problem. I have submitted a commit for bulkinsert: https://github.com/milvus-io/milvus/pull/20506/files

The encoding/json lib uses float64 for all numeric values by default. We can ask the json lib to return string value instead of float64 value in this way:

dec := json.NewDecoder(r)
dec.UseNumber()

The json decoder will return a string type (json.Number) for us, then we can convert the string type value to int64 value in this way: value, err := strconv.ParseInt(string(obj.(json.Number)), 10, 64)

xiaofan-luan commented 1 year ago

/assign @haorenfsa would you like to fix it?

haorenfsa commented 1 year ago

/assign @haorenfsa would you like to fix it?

Fix in #20468, need review

haorenfsa commented 1 year ago

/reopen

solved for:

not solved for:

sre-ci-robot commented 1 year ago

@haorenfsa: Reopened this issue.

In response to [this](https://github.com/milvus-io/milvus/issues/20415#issuecomment-1327100556): >/reopen > >not solved in node sdk & attu Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
yanliang567 commented 1 year ago

@haorenfsa any updates?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

yanliang567 commented 1 year ago

keep it active

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

haorenfsa commented 1 year ago

/reopen

solved for:

  • [x] bulkinsert
  • [x] restapi

not solved for:

  • [ ] node sdk
  • [ ] attu

@yanliang567 still same as above

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

haorenfsa commented 8 months ago

/reopen

sre-ci-robot commented 8 months ago

@haorenfsa: Reopened this issue.

In response to [this](https://github.com/milvus-io/milvus/issues/20415#issuecomment-1897922579): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
hemangjoshi37a commented 8 months ago

The issue with the primary key precision in Milvus is identified as a bug in the REST API and Node SDK due to the handling of int64 fields. As noted, the problem arises from the JSON unmarshaling process, which treats numeric values as float64, leading to precision loss.

For immediate action:

  1. A temporary workaround could be to use the PyMilvus SDK as it doesn't exhibit this behavior.
  2. Continue monitoring updates in the Milvus repository for the resolution in the Node SDK and REST API.

A permanent fix is in progress, and the relevant pull requests and updates will address this issue in upcoming releases. Please keep an eye on the release notes for version 2.2.13 for the resolution.

Thank you for your patience and contributions to improving Milvus.

stale[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

yanliang567 commented 7 months ago

@haorenfsa was it fixed?

haorenfsa commented 7 months ago

nope

PowderLi commented 7 months ago

related to #27856

haorenfsa commented 7 months ago

/assign @PowderLi /assign @shanghaikid

haorenfsa commented 7 months ago

@yanliang567 current status:

solved for:

not solved for:

PowderLi commented 6 months ago

/assign @binbinlv

zhuwenxing commented 6 months ago

It works well in restful v2 api

method: post, 
url: http://10.104.26.106:19530/v2/vectordb/entities/insert, 
cost time: 0.1798231601715088, 
header: {'Content-Type': 'application/json', 'Authorization': 'Bearer root:Milvus', 'Accept-Type-Allow-Int64': 'true', 'RequestId': 'dbc24f8a-d6cf-11ee-9db0-b29c4acc8bb7'}, 
payload: {"collectionName": "test_collection_2024_02_29_14_57_51_418020wtbZttvu", "data": [{"book_id": 9999999999999999, "user_id": 0, "word_count": 0, "book_describe": "book_0", "text_emb": [0.634495385666520...61772868946]}, {"book_id": 10000000000000000, "user_id": 1, "word_count": 1, "book_describe": "book_1", "text_emb": [0.4500510428322396, 0.27717461703918467, 0.06460117570971208, 0.846436635910638]}]}, 
response: {"code":200,"data":{"insertCount":2,"insertIds":[9999999999999999,10000000000000000]}}
binbinlv commented 6 months ago

Using milvus "master-20240229-dcfc3531-amd64" restAPI V1, this issue is fixed too:

result: insert 9999999999999999:

{}{"status":{},"IDs":{"IdField":{"IntId":{"data":[9999999999999999]}}},"succ_index":[0],"insert_cnt":1,"timestamp":448054154404823044}

insert 10000000000000000:

{}{"status":{},"IDs":{"IdField":{"IntId":{"data":[10000000000000000]}}},"succ_index":[0],"insert_cnt":1,"timestamp":448054226416041987}
haorenfsa commented 6 months ago

@yanliang567 current status:

solved for:

not solved for:

shanghaikid commented 6 months ago

https://github.com/milvus-io/milvus-sdk-node/pull/279

@haorenfsa node system doesn't support int64 natively, I've added one solution which is using long.js to insert or resolve this data-type.

PowderLi commented 6 months ago

/unassign

haorenfsa commented 6 months ago

https://github.com/milvus-io/milvus-sdk-node/pull/279 @haorenfsa node system doesn't support int64 natively, I've added one solution which is using long.js to insert or resolve this data-type.

@binbinlv please help verify

binbinlv commented 6 months ago

milvus-io/milvus-sdk-node#279

@haorenfsa node system doesn't support int64 natively, I've added one solution which is using long.js to insert or resolve this data-type.

I think there is no need to test node-sdk now, node system doesn't support int64, so there will be issues for the following code, insert "9999999999999999" using nodesdk, and it returns 10000000000000000. And if tests the solution, I found that @shanghaikid has already add the test case in the repo.

(async () => {
   await milvusClient.collectionManager.createCollection(params);

   const data = [{"book_id":9999999999999999,"book_id_1":5625300123280813090, "book_intro":[0.08090433, 0.19154754, 0.16858263, 0.027101958, 0.07229418, 0.15223257, -0.024227709, 0.13302892, 0.05951315, 0.03572949, -0.015721956, -0.21992287, 0.08134472, 0.18640009, -0.09814235, -0.11117617, 0.10464557, -0.092037976, -0.19489805, -0.069008306, -0.039415136, -0.17841195, 0.076126315, 0.031378396, 0.22680397, 0.045089707, 0.12307317, 0.06711619, 0.15067382, -0.213569, 0.066602595, -0.021743167, -0.2727193, -0.112709574, 0.09504322, 0.02386695, 0.04574049, -0.055642836, -0.16812482, -0.051256, -0.11399734, 0.29519975, 0.109542266, 0.18452083, 0.05543076, -0.064969495, -0.14457555, -0.034600936, 0.045484997, -0.15677887, -0.12983392, 0.20921704, -0.049788076, 0.050687622, -0.23369887, -0.022488454, 0.06089106, 0.14699098, -0.08140416, -0.008949298, -0.14867777, 0.07415456, -0.0027948048, 0.0060837376]}]

   const mq = await milvusClient.dataManager.insert({
      collection_name: "book",
      fields_data: data,
   });
   console.log(mq);
   await milvusClient.collectionManager.dropCollection({  collection_name: "book",});
   console.log("Dropped collction");
   console.log(mq.IDs.int_id.data[0]);

})();
shanghaikid commented 6 months ago

refer: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MIN_SAFE_INTEGER

haorenfsa commented 6 months ago

@shanghaikid by the way, is attu's code updated to use the loaderOptions ?

shanghaikid commented 6 months ago

Attu is not updated. it doesn't have a display issue, by default, all int64 data will be converted to string type.

Nothing is Impossible to Willing Heart!!!

On Fri, Mar 8, 2024 at 11:12 AM shaoyue @.***> wrote:

@shanghaikid https://github.com/shanghaikid by the way, is attu's code updated to use the loaderOptions ?

— Reply to this email directly, view it on GitHub https://github.com/milvus-io/milvus/issues/20415#issuecomment-1984968624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABNFWZ3KQNNBZFVJRHKHG3YXEUBZAVCNFSM6AAAAAAR2KMYCGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBUHE3DQNRSGQ . You are receiving this because you were mentioned.Message ID: @.***>

haorenfsa commented 6 months ago

@yanliang567 ok to close, all solved

haorenfsa commented 6 months ago

/assign @yanliang567

yanliang567 commented 6 months ago

/assign @zhuwenxing could you please help to verify the fixes /unassign