milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.07k stars 2.95k forks source link

[Feature]: Modify the collection schema once collection is created and is not empty #20405

Open jeet129 opened 2 years ago

jeet129 commented 2 years ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Many a times, when we start with an ANN collection definition we don't know the exhaustive list of fields which should be available for the use case and we create the collection with few known fields in the collection schema and as the application evolves there is a need to add/modify the schema defined earlier to accommodate more attributes.

Without this, the only way is to recreate a collection and do a fresh ingestion of data, which might not be an easy choice considering the longer data ingestion pipeline for huge collections.

Describe the solution you'd like.

We need a way to add new fields(non-mandatory/fields with default values)/drop existing(non-primary) fields from collection. This way the same collection can be used to serve the different scenarios pertaining to a use case without a need to create a new collection and hydrating it with the data.

Also there should be an option to update the values for such attributes for existing entities.

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

zyy20191 commented 2 years ago

The ability to add field to an already created collection is really convenient, and I hope you can consider this requirement

xiaofan-luan commented 1 year ago

Let's keep it. Agree this is very useful feature. But this require a lot effort so I think if anyone has time pls take it. Otherwise we will wait for performance/stability issue solved and we start to work on it

sskserk commented 2 months ago

Hey Milvus Developers and Community,

Would it be feasible to implement a feature that includes basic routines for renaming, adding, or dropping columns? Even a simple set of these functions could significantly enhance our capabilities.

The use case is straightforward but has a substantial impact:

A major challenge we face is determining how to properly migrate the data.

Currently, we are compelled to recreate the entire collection from scratch whenever an additional or modified field is needed. Given the vast amounts of data involved, this process is exceedingly challenging.

Providing a command-line tool that could handle these modifications would offer significant relief and improve our efficiency.

I also do suppose that physically modification of an existing collection can be practically an impossible task. It might require changes of the vector's data which is a computational challenge.

P/S: Would be happy to cooperate with somebody or assist with a corresponding MR.

xiaofan-luan commented 2 months ago

this is for sure already on our roadmap.

@tedxu and @smellthemoon is actually working on it so hopefully that would help..

@smellthemoon could you please followup with @sskserk and see how it can work with our latest modify schema feature

sskserk commented 2 months ago

@xiaofan-luan , @tedxu , @smellthemoon,

I am eager to test the new feature and am looking forward to receiving it. I'm ready to test a prerelease of this feature, just need to know when.

The implementation of this feature will undoubtedly mark a significant milestone. I anticipate that, as a result, a new Milvus-related product similar to "Flyway" might emerge in the future.

Your solution is already widely adopted by major companies, and this enhancement will further solidify its enterprise-grade capabilities.

Thank you for the positive update!

xiaofan-luan commented 2 months ago

@xiaofan-luan , @tedxu , @smellthemoon,

I am eager to test the new feature and am looking forward to receiving it. I'm ready to test a prerelease of this feature, just need to know when.

The implementation of this feature will undoubtedly mark a significant milestone. I anticipate that, as a result, a new Milvus-related product similar to "Flyway" might emerge in the future.

Your solution is already widely adopted by major companies, and this enhancement will further solidify its enterprise-grade capabilities.

Thank you for the positive update!

could, let's ship it

smellthemoon commented 2 months ago

In fact, the add field feature has been included in our development plan. Users can add a new column through add field operation. The values in this new column are all null values. After the add field operation is completed, the field data in insert/upsert request needs to has the data of the new column. I will keep you updated if there is any progress. @sskserk

smellthemoon commented 2 months ago

/assign

xiaofan-luan commented 1 month ago

On 2.0 we support null/default value. The target for 3.0 is to support schema change.

iamkhalidbashir commented 1 month ago

is there any null value for embeddings field? js lib if we pass null for an embedding field, we get this error

Error processing PDF: TypeError: Cannot read properties of null (reading 'length')
    at Function.concat (node:buffer:589:19)
    at /app/node_modules/@zilliz/milvus2-sdk-node/dist/milvus/grpc/Data.js:241:47
    at Array.map (<anonymous>)
    at MilvusClient.<anonymous> (/app/node_modules/@zilliz/milvus2-sdk-node/dist/milvus/grpc/Data.js:218:64)
    at Generator.next (<anonymous>)
    at fulfilled (/app/node_modules/@zilliz/milvus2-sdk-node/dist/milvus/grpc/Data.js:5:58)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
xiaofan-luan commented 1 month ago

is there any null value for embeddings field? js lib if we pass null for an embedding field, we get this error

Error processing PDF: TypeError: Cannot read properties of null (reading 'length')
    at Function.concat (node:buffer:589:19)
    at /app/node_modules/@zilliz/milvus2-sdk-node/dist/milvus/grpc/Data.js:241:47
    at Array.map (<anonymous>)
    at MilvusClient.<anonymous> (/app/node_modules/@zilliz/milvus2-sdk-node/dist/milvus/grpc/Data.js:218:64)
    at Generator.next (<anonymous>)
    at fulfilled (/app/node_modules/@zilliz/milvus2-sdk-node/dist/milvus/grpc/Data.js:5:58)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

embeddings can not be null.

for data field, it can be null only if nullable enabled after milvus 2.5

@smellthemoon Do we support alter a non-nullable field to nullable?