milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.69k stars 2.93k forks source link

[Feature]: Import data (e.g. - vectors in parquet files) into milvus standalone with local storageType #36445

Open liorf95 opened 1 month ago

liorf95 commented 1 month ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Currently, it is not possible to import data (e.g. - vectors in parquet files) into Milvus standalone with local storageType and a remote minio instance is required.

Describe the solution you'd like.

Import data (e.g. - vectors in parquet files) into Milvus standalone with local storageType

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

xiaofan-luan commented 1 month ago

/assign @bigsheeper please help on it

bigsheeper commented 1 month ago

Hi @liorf95 ,

To simplify the local import process, we recommend using our bulkwriter tool. It is specifically designed to handle bulk imports efficiently. You can find the detailed instructions for setting up and using bulkwriter in our documentation here: https://milvus.io/api-reference/pymilvus/v2.4.x/DataImport/LocalBulkWriter/LocalBulkWriter.md.

Please let us know if you need any assistance with the setup or have any questions regarding the tool.

bigsheeper commented 1 month ago

/assign @liorf95

liorf95 commented 1 month ago

Hi @liorf95 ,

To simplify the local import process, we recommend using our bulkwriter tool. It is specifically designed to handle bulk imports efficiently. You can find the detailed instructions for setting up and using bulkwriter in our documentation here: https://milvus.io/api-reference/pymilvus/v2.4.x/DataImport/LocalBulkWriter/LocalBulkWriter.md.

Please let us know if you need any assistance with the setup or have any questions regarding the tool.

This is exactly what I did- but the LocalBulkWriter does not work as described in bug https://github.com/milvus-io/milvus/issues/35530 (only RemoteBulkWriter works).

bigsheeper commented 1 month ago

Hi @lhotari ,

My apologies for the confusion earlier. I’d like to clarify the usage of the tools:

  1. A LocalBulkWriter instance rewrites your raw data locally into a format that Milvus understands. It’s useful if you want to preprocess the data before uploading it to Milvus.
  2. If you want to directly import your data into Milvus, We recommend using RemoteBulkWriter instead, which handles data ingestion remotely and can simplify the import process.