milvus-io / milvus-tools

A data migration tool for Milvus.
Apache License 2.0
69 stars 21 forks source link

Milvus Data Migration Tool

Overview

MilvusDM (Milvus Data Migration) is a data migration tool for Milvus that supports importing Faiss and HDF5 data files into Milvus, migrating data between Milvus, and it also supports batch backup of Milvus data to local files. Using milvusdm can help developers improve usage efficiency, reduce operation and maintenance costs.

Getting started

Prerequisites

Operating system Supported versions
CentOS 7.5 or higher
Ubuntu LTS 18.04 or higher
Software Version
Milvus 0.10.x or 1.x or 2.x
Python3 3.7 or higher
pip3 Corresponds to python version.

Install

Add the following two lines to ~/.bashrc file:

export MILVUSDM_PATH='/home/$user/milvusdm'
export LOGS_NUM=0

MILVUSDM_PATH: This parameter defines the working path of milvusdm. Logs and data generated by Milvusdm will be stored in this path. The default value is /home/$user/milvusdm.

LOGS_NUM: Milvusdm log file generates one per day. This parameter defines the number of log files to be saved. The default value is 0, which means all log files are saved.

Make the configured environment variables:

$ source ~/.bashrc
$ pip3 install pymilvusdm==2.0

The pymilvusdm2.0 is used to migrate data from Milvus(0.10.x or 1.x) to Milvus2.x.

How to use

Faiss to Milvus

Export one Faiss index file to Milvus in a specified collection or partition.

In the current version, only flat and ivf_flat indexes for floating data are supported.

$ wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/F2M.yaml
F2M:
  milvus_version: 1.x
  data_path: '/home/data/faiss1.index'
  dest_host: '127.0.0.1'
  dest_port: 19530
  mode: 'append'
  dest_collection_name: 'test'
  dest_partition_name: ''
  collection_parameter:
    dimension: 256
    index_file_size: 1024
    metric_type: 'L2'
  1. Optional parameters: dest_partition_name
  2. The Parameter mode can be selected from append, skip, overwrite. This parameter takes effect only when the specified collection name exists in Milvus library.

​ append: Append data to the existing collection

​ skip: Skip the existing collection and do not perform any operations

​ overwrite: Delete the old collection, create a new collection with the same name and then import the data.

parameter description example
F2M Task: Export data in HDF5 to Milvus.
milvus_version Version of Milvus. 0.10.5
data_path Path to the data in Faiss. '/home/user/data/faiss.index'
dest_host Milvus server address '127.0.0.1'
dest_port Milvus server port. 19530
mode Mode of migration. 'append'
dest_collection_name Name of the collection to import data to. 'test'
dest_partition_name Name of the partition to import data to. (Optional) 'partition'
collection_parameter Collection-specific information such as vector dimension, index file size, and similarity metric. dimension: 512
index_file_size: 1024
metric_type: 'HAMMING'
$ milvusdm --yaml F2M.yaml

HDF5 to Milvus

Export one or more HDF5 files to Milvus in a specified collection or partition.

We provide the HDF5 examples of float vectors(dim-100) and binary vectors(dim-512) and their corresponding ids.

$ wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/H2M.yaml
H2M:
  milvus-version: 1.x
  data_path:
    - /Users/zilliz/float_1.h5
    - /Users/zilliz/float_2.h5
  data_dir:
  dest_host: '127.0.0.1'
  dest_port: 19530
  mode: 'overwrite'        # 'skip/append/overwrite'
  dest_collection_name: 'test_float'
  dest_partition_name: 'partition_1'
  collection_parameter:
    dimension: 128
    index_file_size: 1024
    metric_type: 'L2'

or

H2M:
  milvus_version: 1.x
  data_path:
  data_dir: '/Users/zilliz/HDF5_data'
  dest_host: '127.0.0.1'
  dest_port: 19530
  mode: 'append'        # 'skip/append/overwrite'
  dest_collection_name: 'test_binary'
  dest_partition_name: 
  collection_parameter:
    dimension: 512
    index_file_size: 1024
    metric_type: 'HAMMING'
  1. Optional parameters: dest_partition_name
  2. Just configure data_path or data_dir, while the other one is None.
parameter description example
H2M Task: Export data in HDF5 to Milvus.
milvus_version Version of Milvus. 0.10.5
data_path Path to the HDF5 file. - /Users/zilliz/float_1.h5
- /Users/zilliz/float_1.h5
data_dir Directory of the HDF5 files. /Users/zilliz/Desktop/HDF5_data
dest_host Milvus server address. '127.0.0.1'
dest_port Milvus server port. 19530
mode Mode of migration 'append'
dest_collection_name Name of the collection to import data to. 'test_float'
dest_partition_name Name of the partition to import data to.(optional) 'partition_1'
collection_parameter Collection-specific information such as vector dimension, index file size, and similarity metric. dimension: 512
index_file_size: 1024
metric_type: 'HAMMING'
$ milvusdm --yaml H2M.yaml

Milvus to Milvus

MilvusDM does not support migrating data from Milvus 2.0 standalone to Milvus 2.0 cluster.

Copy a collection of source_milvus or multiple partitions of a collection into the corresponding collection or partition in dest_milvus.

$ wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/M2M.yaml
M2M:
  milvus_version: 1.x
  source_milvus_path: '/home/user/milvus'
  mysql_parameter:
    host: '127.0.0.1'
    user: 'root'
    port: 3306
    password: '123456'
    database: 'milvus'
  source_collection: # specify the 'partition_1' and 'partition_2' partitions of the 'test' collection.
    test:
      - 'partition_1'
      - 'partition_2'
  dest_host: '127.0.0.1'
  dest_port: 19530
  mode: 'skip' # 'skip/append/overwrite'

Or

M2M:
  milvus_version: 1.x
  source_milvus_path: '/home/user/milvus'
  mysql_parameter:
  source_collection: # specify the collection named 'test'
    test:
  dest_host: '127.0.0.1'
  dest_port: 19530
  mode: 'skip' # 'skip/append/overwrite'
  1. Required parameters: source_milvus_path, source_collection, dest_host, dest_port and mode.
  2. If you are using MySQL to manage source_milvus metadata, configure the mysql_parameter parameter, which is empty if you are using SQLite.
  3. The source_collection parameter must specify a collection name, and the following partition name is optional and multiple partitions can be added.
parameter description example
M2M Task: Copy the data from Milvus to the same version of Milvus.
milvus_version The dest-milvus version. 0.10.5
source_milvus_path Working directory of the source Milvus. '/home/user/milvus'
mysql_parameter MySQL settings for the source Milvus, including mysql host, user, port, password and database parameters. host: '127.0.0.1'
user: 'root'
port: 3306
password: '123456'
database: 'milvus'
source_collection Names of the collection and its partitions in the source Milvus. test:
- 'partition_1'
- 'partition_2'
dest_host Target Milvus server address. '127.0.0.1'
dest_port Target Milvus server port. 19530
mode Mode of migration 'skip'
$ milvusdm --yaml M2M.yaml

Milvus to HDF5

Export a Milvus collection or multiple partitions of a collection to a local HDF5 format file.

$ wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/M2H.yaml
M2H:
  milvus_version: 1.x
  source_milvus_path: '/home/user/milvus'
  mysql_parameter:
    host: '127.0.0.1'
    user: 'root'
    port: 3306
    password: '123456'
    database: 'milvus'
  source_collection: # specify the 'partition_1' and 'partition_2' partitions of the 'test' collection.
    test:
      - 'partition_1'
      - 'partition_2'
  data_dir: '/home/user/data'

Or

M2H:
  milvus_version: 1.x
  source_milvus_path: '/home/user/milvus'
  mysql_parameter:
  source_collection: # specify the collection named 'test'
    test:
  data_dir: '/home/user/data'
  1. The source_milvus_path, source_collection, and data_dir parameters are required.
  2. If you are using MySQL to manage source_milvus metadata, configure the mysql_parameter parameter, which is empty if you are using SQLite.
  3. The source_collection parameter must specify a collection name, and the following partition name is optional and multiple partitions can be added.
parameter description example
M2H Task: Export Milvus data to local HDF5 format files.
milvus_version The source-milvus version. 0.10.5
source_milvus_path Working directory of Milvus. '/home/user/milvus'
mysql_parameter MySQL settings for Milvus, including mysql host, user, port, password and database parameters. host: '127.0.0.1'
user: 'root'
port: 3306
password: '123456'
database: 'milvus'
source_collection Names of the collection and its partitions in Milvus. test:
- 'partition_1'
- 'partition_2'
data_dir Directory to save HDF5 files. '/home/user/data'
$ milvusdm --yaml M2H.yaml

Code Structure

If you would like to contribute code to this project, you can find out more about our code structure:

Todo