[Enhancement]: schema validation is fragile

zhengbuqian commented 1 month ago

Is there an existing issue for this?

[X] I have searched the existing issues

What would you like to be added?

currently we validate server schema by comparing with user provided schema https://github.com/milvus-io/pymilvus/blob/76de0ab1caba89a939a784545e1a61d13ea139a3/pymilvus/orm/collection.py#L134.

with doc in doc out, we introduced tokenizer_params in params, which is a dict in user input, but a json string in server response. direct comparing will cause a failure.

now the tokenizer_params is simple so I used a temp resolution in https://github.com/milvus-io/pymilvus/pull/2298 to convert the json string back to a dict, but that will likely fail after we introduced more configs in tokenizer_params: keys in json may be reordered and the resulting dict will no longer equal.

Why is this needed?

No response

Anything else?

No response

zhengbuqian commented 1 month ago

/assign @zhengbuqian /assign @XuanYang-cn

zhengbuqian commented 1 month ago

this should be fixed before the Milvus 2.5 release

XuanYang-cn commented 1 month ago

@zhengbuqian we could impl an __eq__ func in schema to cutomize what's need to compare and what could be ignored.

Is Function going to change of the same collection? If so, then probably we should ignore it when validate schema.
How does tokenizer_params looks like in Milvus? Please give an example of a classic tokenizer_params THX

milvus-io / pymilvus