milvus-io / milvus-docs

This repository is for Milvus technical documentation update and maintenance. Visit Milvus.io for fully rendered technical documents.
Apache License 2.0
72 stars 307 forks source link

v2.0.x 插入数据 (insert_data.md) Doc Update #1496

Open liangyihan opened 2 years ago

liangyihan commented 2 years ago

Is there an existing issue for this?

Issue

1.insert data correctly set structure: image input parameter: {"userFace":[[0.80020356, 0.3490076, 0.499359, 0.94444656], [0.23521596, 0.73284143, 0.7519735, 0.8311272]]} return results: image insert code: public R insert(String partitionName,List<List> vectors) { System.out.println("========== insert vector() =========="); List fields = new ArrayList<>(); fields.add(new InsertParam.Field("userFace", DataType.FloatVector, vectors));

    InsertParam insertParam = InsertParam.newBuilder()
            .withCollectionName("test2")
            .withPartitionName(partitionName)
            .withFields(fields)
            .build();
    for (int i=0;i<fields.size();i++){
        System.out.println("insert vector data: "+fields.get(i).getName()+"="+fields.get(i).getValues());
    }
    R<MutationResult> response = milvusClient.insert(insertParam);
    handleResponseStatus(response);

    return response;
}

2.incorrect insert data set structure: image input parameter: {"userFace":[[0.80020356, 0.3490076, 0.499359, 0.94444656], [0.23521596, 0.73284143, 0.7519735, 0.8311272]],"userAge":[30,31]} return results: image

insert code: public R insert(String partitionName,List<List> vectors,List ages) { System.out.println("========== insert() vector and age =========="); List fields = new ArrayList<>(); fields.add(new InsertParam.Field("userAge", DataType.Int8, ages)); fields.add(new InsertParam.Field("userFace", DataType.FloatVector, vectors));

    InsertParam insertParam = InsertParam.newBuilder()
            .withCollectionName("test3")
            .withPartitionName(partitionName)
            .withFields(fields)
            .build();
    for (int i=0;i<fields.size();i++){
        System.out.println("insert vector and age  data: "+fields.get(i).getName()+"="+fields.get(i).getValues());
    }
    R<MutationResult> response = milvusClient.insert(insertParam);
    handleResponseStatus(response);

    return response;
}

3.if you import data through the import on the Attu page, it is correct to import userface to test2, userface and userage to test3

through the above findings, if you call the insert API to store the data, the data is wrong when inserting the data userface and userage to test3. Please help me troubleshoot the problem. Thank you!

pom.xml as follows <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">

4.0.0
<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.5.11-SNAPSHOT</version>
    <relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.example</groupId>
<artifactId>milvus2</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>milvus2</name>
<description>milvus2</description>
<properties>
    <java.version>1.8</java.version>
</properties>
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>io.milvus</groupId>
        <artifactId>milvus-sdk-java</artifactId>
        <version>2.0.4</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>1.7.30</version>
    </dependency>
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
    </dependency>
    <dependency>
        <groupId>com.alibaba</groupId>
        <artifactId>fastjson</artifactId>
        <version>1.2.33</version>
    </dependency>
</dependencies>

<build>
    <plugins>
        <plugin>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-maven-plugin</artifactId>
        </plugin>
    </plugins>
</build>
<repositories>
    <repository>
        <id>spring-milestones</id>
        <name>Spring Milestones</name>
        <url>https://repo.spring.io/milestone</url>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </repository>
    <repository>
        <id>spring-snapshots</id>
        <name>Spring Snapshots</name>
        <url>https://repo.spring.io/snapshot</url>
        <releases>
            <enabled>false</enabled>
        </releases>
    </repository>
</repositories>
<pluginRepositories>
    <pluginRepository>
        <id>spring-milestones</id>
        <name>Spring Milestones</name>
        <url>https://repo.spring.io/milestone</url>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </pluginRepository>
    <pluginRepository>
        <id>spring-snapshots</id>
        <name>Spring Snapshots</name>
        <url>https://repo.spring.io/snapshot</url>
        <releases>
            <enabled>false</enabled>
        </releases>
    </pluginRepository>
</pluginRepositories>

docker-compose.yaml as follows: version: '3.5'

services: etcd: container_name: milvus-etcd image: quay.io/coreos/etcd:v3.5.0 restart: always

networks:

#  - etcd-net
ports:
  - "2379:2379"
  - "2380:2380" 
environment:
  - ETCD_AUTO_COMPACTION_MODE=revision
  - ETCD_AUTO_COMPACTION_RETENTION=1000
  - ETCD_QUOTA_BACKEND_BYTES=4294967296
volumes:
  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd

minio: container_name: milvus-minio image: minio/minio:RELEASE.2022-04-01T03-41-39Z environment:

MINIO_ROOT_USER: minioadmin

 # MINIO_ROOT_PASSWORD: minioadmin123
  MINIO_PROMETHEUS_AUTH_TYPE: public
volumes:
  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
ports:
  - "9000:9000" 
  - "9090:9090"
command: minio server /minio_data --console-address ":9090" -address ":9000"
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
  interval: 30s
  timeout: 20s
  retries: 3

standalone: container_name: milvus-standalone image: milvusdb/milvus:v2.0.2 command: ["milvus", "run", "standalone"] environment: ETCD_ENDPOINTS: etcd:2379 MINIO_ADDRESS: minio:9000 PULSAR_ADDRESS: pulsar://pulsar:6650 volumes:

networks: default: name: milvus2.0.2

configs/milvus.yaml as follows

Licensed to the LF AI & Data foundation under one

or more contributor license agreements. See the NOTICE file

distributed with this work for additional information

regarding copyright ownership. The ASF licenses this file

to you under the Apache License, Version 2.0 (the

"License"); you may not use this file except in compliance

with the License. You may obtain a copy of the License at

#

http://www.apache.org/licenses/LICENSE-2.0

#

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

Related configuration of etcd, used to store Milvus metadata.

etcd: endpoints:

Related configuration of minio, which is responsible for data persistence for Milvus.

minio: address: localhost # Address of MinIO/S3 port: 9000 # Port of MinIO/S3 accessKeyID: minioadmin # accessKeyID of MinIO/S3 secretAccessKey: minioadmin # MinIO/S3 encryption string useSSL: false # Access to MinIO/S3 with SSL bucketName: "a-bucket" # Bucket name in MinIO/S3 rootPath: files # The root path where the message is stored in MinIO/S3

Related configuration of pulsar, used to manage Milvus logs of recent mutation operations, output streaming log, and provide log publish-subscribe services.

pulsar: address: localhost # Address of pulsar port: 6650 # Port of pulsar maxMessageSize: 5242880 # 5 1024 1024 Bytes, Maximum size of each message in pulsar.

rocksmq: path: /var/lib/milvus/rdb_data # The path where the message is stored in rocksmq rocksmqPageSize: 2147483648 # 2 GB, 2 1024 1024 1024 bytes, The size of each page of messages in rocksmq retentionTimeInMinutes: 10080 # 7 days, 7 24 60 minutes, The retention time of the message in rocksmq. retentionSizeInMB: 8192 # 8 GB, 8 1024 MB, The retention size of the message in rocksmq.

Related configuration of rootCoord, used to handle data definition language (DDL) and data control language (DCL) requests

rootCoord: address: localhost port: 53100

grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32, Maximum data size received by the server serverMaxSendSize: 2147483647 # math.MaxInt32, Maximum data size sent by the server clientMaxRecvSize: 104857600 # 100 MB, Maximum data size received by the client clientMaxSendSize: 104857600 # 100 MB, Maximum data size sent by the client

dmlChannelNum: 256 # The number of dml channels created at system startup maxPartitionNum: 4096 # Maximum number of partitions in a collection minSegmentSizeToEnableIndex: 1024 # It's a threshold. When the segment size is less than this value, the segment will not be indexed

Related configuration of proxy, used to validate client requests and reduce the returned results.

proxy: port: 19530

grpc: serverMaxRecvSize: 536870912 # 512 MB, 512 1024 1024 Bytes serverMaxSendSize: 536870912 # 512 MB, 512 1024 1024 Bytes clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024

timeTickInterval: 200 # ms, the interval that proxy synchronize the time tick msgStream: timeTick: bufSize: 512 maxNameLength: 255 # Maximum length of name for a collection or alias maxFieldNum: 256 # Maximum number of fields in a collection maxDimension: 32768 # Maximum dimension of a vector maxShardNum: 256 # Maximum number of shards in a collection maxTaskNum: 1024 # max task number of proxy task queue bufFlagExpireTime: 3600 # second, the time to expire bufFlag from cache in collectResultLoop bufFlagCleanupInterval: 600 # second, the interval to clean bufFlag cache in collectResultLoop

Related configuration of queryCoord, used to manage topology and load balancing for the query nodes, and handoff from growing segments to sealed segments.

queryCoord: address: localhost port: 19531 autoHandoff: true # Enable auto handoff autoBalance: true # Enable auto balance overloadedMemoryThresholdPercentage: 90 # The threshold percentage that memory overload balanceIntervalSeconds: 60 memoryUsageMaxDifferencePercentage: 30

grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024

Related configuration of queryNode, used to run hybrid search between vector and scalar data.

queryNode: cacheSize: 32 # GB, default 32 GB, cacheSize is the memory used for caching data for faster query. The cacheSize must be less than system memory size. gracefulTime: 0 # Minimum time before the newly inserted data can be searched (in ms) port: 21123

grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024

stats: publishInterval: 1000 # Interval for querynode to report node information (milliseconds) dataSync: flowGraph: maxQueueLength: 1024 # Maximum length of task queue in flowgraph maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph msgStream: search: recvBufSize: 512 # msgPack channel buffer size pulsarBufSize: 512 # pulsar channel buffer size searchResult: recvBufSize: 64 # msgPack channel buffer size

Segcore will divide a segment into multiple chunks.

segcore: chunkRows: 32768 # The number of vectors in a chunk.

indexCoord: address: localhost port: 31000

grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024

indexNode: port: 21121

grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024

dataCoord: address: localhost port: 13333

grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024 enableCompaction: true # Enable data segment compression enableGarbageCollection: false

segment: maxSize: 512 # Maximum size of a segment in MB sealProportion: 0.75 # It's the minimum proportion for a segment which can be sealed assignmentExpiration: 2000 # The time of the assignment expiration in ms

compaction: enableAutoCompaction: true

gc: interval: 3600 # gc interval in seconds missingTolerance: 86400 # file meta missing tolerance duration in seconds, 6024 dropTolerance: 86400 # file belongs to dropped entity tolerance duration in seconds, 6024

dataNode: port: 21124

grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024

dataSync: flowGraph: maxQueueLength: 1024 # Maximum length of task queue in flowgraph maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph flush:

Max buffer size to flush for a single segment.

insertBufSize: 16777216 # Bytes, 16 MB

Configure whether to store the vector and the local path when querying/searching in Querynode.

localStorage: path: /var/lib/milvus/data/ enabled: true

Configures the system log output.

log: level: debug # info, warn, error, panic, fatal file: rootPath: "" # default to stdout, stderr maxSize: 300 # MB maxAge: 10 # Maximum time for log retention in day. maxBackups: 20 format: json # text/json

msgChannel:

Channel name generation rule: ${namePrefix}-${ChannelIdx}

chanNamePrefix: cluster: "by-dev" rootCoordTimeTick: "rootcoord-timetick" rootCoordStatistics: "rootcoord-statistics" rootCoordDml: "rootcoord-dml" rootCoordDelta: "rootcoord-delta" search: "search" searchResult: "searchResult" proxyTimeTick: "proxyTimeTick" queryTimeTick: "queryTimeTick" queryNodeStats: "query-node-stats"

Cmd for loadIndex, flush, etc...

cmd: "cmd"
dataCoordInsertChannel: "insert-channel-"
dataCoordStatistic: "datacoord-statistics-channel"
dataCoordTimeTick: "datacoord-timetick-channel"
dataCoordSegmentInfo: "segment-info-channel"

Sub name generation rule: ${subNamePrefix}-${NodeID}

subNamePrefix: rootCoordSubNamePrefix: "rootCoord" proxySubNamePrefix: "proxy" queryNodeSubNamePrefix: "queryNode" dataNodeSubNamePrefix: "dataNode" dataCoordSubNamePrefix: "dataCoord"

common: defaultPartitionName: "_default" # default partition name for a collection defaultIndexName: "_default_idx" # default index name retentionDuration: 432000 # 5 days in seconds

knowhere:

Default value: auto

Valid values: [auto, avx512, avx2, avx, sse4_2]

This configuration is only used by querynode and indexnode, it selects CPU instruction set for Searching and Index-building.

thanks again!

Suggestion

No response

Anything else?

No response

LocoRichard commented 2 years ago

Hi @liangyihan , this repo is for

Milvus technical documentation

For Milvus usage-related issues, you can file the issue in https://github.com/milvus-io/milvus for a prompt respond.