insert的代码:
public R insert(String partitionName,List<List> vectors,List ages) {
System.out.println("========== insert() vector and age ==========");
List fields = new ArrayList<>();
fields.add(new InsertParam.Field("userAge", DataType.Int8, ages));
fields.add(new InsertParam.Field("userFace", DataType.FloatVector, vectors));
InsertParam insertParam = InsertParam.newBuilder()
.withCollectionName("test3")
.withPartitionName(partitionName)
.withFields(fields)
.build();
for (int i=0;i<fields.size();i++){
System.out.println("insert vector and age data: "+fields.get(i).getName()+"="+fields.get(i).getValues());
}
R<MutationResult> response = milvusClient.insert(insertParam);
handleResponseStatus(response);
return response;
}
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Related configuration of etcd, used to store Milvus metadata.
etcd:
endpoints:
localhost:2379
rootPath: by-dev # The root path where data is stored in etcd
metaSubPath: meta # metaRootPath = rootPath + '/' + metaSubPath
kvSubPath: kv # kvRootPath = rootPath + '/' + kvSubPath
Related configuration of minio, which is responsible for data persistence for Milvus.
minio:
address: localhost # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
bucketName: "a-bucket" # Bucket name in MinIO/S3
rootPath: files # The root path where the message is stored in MinIO/S3
Related configuration of pulsar, used to manage Milvus logs of recent mutation operations, output streaming log, and provide log publish-subscribe services.
pulsar:
address: localhost # Address of pulsar
port: 6650 # Port of pulsar
maxMessageSize: 5242880 # 5 1024 1024 Bytes, Maximum size of each message in pulsar.
rocksmq:
path: /var/lib/milvus/rdb_data # The path where the message is stored in rocksmq
rocksmqPageSize: 2147483648 # 2 GB, 2 1024 1024 1024 bytes, The size of each page of messages in rocksmq
retentionTimeInMinutes: 10080 # 7 days, 7 24 60 minutes, The retention time of the message in rocksmq.
retentionSizeInMB: 8192 # 8 GB, 8 1024 MB, The retention size of the message in rocksmq.
Related configuration of rootCoord, used to handle data definition language (DDL) and data control language (DCL) requests
rootCoord:
address: localhost
port: 53100
grpc:
serverMaxRecvSize: 2147483647 # math.MaxInt32, Maximum data size received by the server
serverMaxSendSize: 2147483647 # math.MaxInt32, Maximum data size sent by the server
clientMaxRecvSize: 104857600 # 100 MB, Maximum data size received by the client
clientMaxSendSize: 104857600 # 100 MB, Maximum data size sent by the client
dmlChannelNum: 256 # The number of dml channels created at system startup
maxPartitionNum: 4096 # Maximum number of partitions in a collection
minSegmentSizeToEnableIndex: 1024 # It's a threshold. When the segment size is less than this value, the segment will not be indexed
Related configuration of proxy, used to validate client requests and reduce the returned results.
timeTickInterval: 200 # ms, the interval that proxy synchronize the time tick
msgStream:
timeTick:
bufSize: 512
maxNameLength: 255 # Maximum length of name for a collection or alias
maxFieldNum: 256 # Maximum number of fields in a collection
maxDimension: 32768 # Maximum dimension of a vector
maxShardNum: 256 # Maximum number of shards in a collection
maxTaskNum: 1024 # max task number of proxy task queue
bufFlagExpireTime: 3600 # second, the time to expire bufFlag from cache in collectResultLoop
bufFlagCleanupInterval: 600 # second, the interval to clean bufFlag cache in collectResultLoop
Related configuration of queryCoord, used to manage topology and load balancing for the query nodes, and handoff from growing segments to sealed segments.
queryCoord:
address: localhost
port: 19531
autoHandoff: true # Enable auto handoff
autoBalance: true # Enable auto balance
overloadedMemoryThresholdPercentage: 90 # The threshold percentage that memory overload
balanceIntervalSeconds: 60
memoryUsageMaxDifferencePercentage: 30
Related configuration of queryNode, used to run hybrid search between vector and scalar data.
queryNode:
cacheSize: 32 # GB, default 32 GB, cacheSize is the memory used for caching data for faster query. The cacheSize must be less than system memory size.
gracefulTime: 0 # Minimum time before the newly inserted data can be searched (in ms)
port: 21123
segment:
maxSize: 512 # Maximum size of a segment in MB
sealProportion: 0.75 # It's the minimum proportion for a segment which can be sealed
assignmentExpiration: 2000 # The time of the assignment expiration in ms
compaction:
enableAutoCompaction: true
gc:
interval: 3600 # gc interval in seconds
missingTolerance: 86400 # file meta missing tolerance duration in seconds, 6024
dropTolerance: 86400 # file belongs to dropped entity tolerance duration in seconds, 6024
dataSync:
flowGraph:
maxQueueLength: 1024 # Maximum length of task queue in flowgraph
maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
flush:
Max buffer size to flush for a single segment.
insertBufSize: 16777216 # Bytes, 16 MB
Configure whether to store the vector and the local path when querying/searching in Querynode.
common:
defaultPartitionName: "_default" # default partition name for a collection
defaultIndexName: "_default_idx" # default index name
retentionDuration: 432000 # 5 days in seconds
knowhere:
Default value: auto
Valid values: [auto, avx512, avx2, avx, sse4_2]
This configuration is only used by querynode and indexnode, it selects CPU instruction set for Searching and Index-building.
Hi @liangyihan , this repo is for Milvus technical documentation. For Milvus usage-related issues, you can file the issue in https://github.com/milvus-io/milvus for a prompt respond.
Is there an existing issue for this?
Issue
1.插入数据正常的情况 集合结构 输入参数: {"userFace":[[0.80020356, 0.3490076, 0.499359, 0.94444656], [0.23521596, 0.73284143, 0.7519735, 0.8311272]]} 返回结果 insert的代码: public R insert(String partitionName,List<List> vectors) {
System.out.println("========== insert vector() ==========");
List fields = new ArrayList<>();
fields.add(new InsertParam.Field("userFace", DataType.FloatVector, vectors));
2.插入数据不正常的情况 集合结构 输入参数: {"userFace":[[0.80020356, 0.3490076, 0.499359, 0.94444656], [0.23521596, 0.73284143, 0.7519735, 0.8311272]],"userAge":[30,31]} 返回结果
insert的代码: public R insert(String partitionName,List<List> vectors,List ages) {
System.out.println("========== insert() vector and age ==========");
List fields = new ArrayList<>();
fields.add(new InsertParam.Field("userAge", DataType.Int8, ages));
fields.add(new InsertParam.Field("userFace", DataType.FloatVector, vectors));
3.如果通过页面的import导入数据,无论是导入userFace到test2还是导入userFace、userAge到test3都是正常的
通过以上发现,如果是调用insert api来入库当插入数据userFace、userAge到test3时数据是不对的,麻烦帮我排查一下问题,谢谢!
pom.xml 如下 <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
docker-compose启动的milvus配置如下: version: '3.5'
services: etcd: container_name: milvus-etcd image: quay.io/coreos/etcd:v3.5.0 restart: always
networks:
minio: container_name: milvus-minio image: minio/minio:RELEASE.2022-04-01T03-41-39Z environment:
MINIO_ROOT_USER: minioadmin
standalone: container_name: milvus-standalone image: milvusdb/milvus:v2.0.2 command: ["milvus", "run", "standalone"] environment: ETCD_ENDPOINTS: etcd:2379 MINIO_ADDRESS: minio:9000 PULSAR_ADDRESS: pulsar://pulsar:6650 volumes:
networks: default: name: milvus2.0.2
configs/milvus.yaml 如下
Licensed to the LF AI & Data foundation under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
#
http://www.apache.org/licenses/LICENSE-2.0
#
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Related configuration of etcd, used to store Milvus metadata.
etcd: endpoints:
Related configuration of minio, which is responsible for data persistence for Milvus.
minio: address: localhost # Address of MinIO/S3 port: 9000 # Port of MinIO/S3 accessKeyID: minioadmin # accessKeyID of MinIO/S3 secretAccessKey: minioadmin # MinIO/S3 encryption string useSSL: false # Access to MinIO/S3 with SSL bucketName: "a-bucket" # Bucket name in MinIO/S3 rootPath: files # The root path where the message is stored in MinIO/S3
Related configuration of pulsar, used to manage Milvus logs of recent mutation operations, output streaming log, and provide log publish-subscribe services.
pulsar: address: localhost # Address of pulsar port: 6650 # Port of pulsar maxMessageSize: 5242880 # 5 1024 1024 Bytes, Maximum size of each message in pulsar.
rocksmq: path: /var/lib/milvus/rdb_data # The path where the message is stored in rocksmq rocksmqPageSize: 2147483648 # 2 GB, 2 1024 1024 1024 bytes, The size of each page of messages in rocksmq retentionTimeInMinutes: 10080 # 7 days, 7 24 60 minutes, The retention time of the message in rocksmq. retentionSizeInMB: 8192 # 8 GB, 8 1024 MB, The retention size of the message in rocksmq.
Related configuration of rootCoord, used to handle data definition language (DDL) and data control language (DCL) requests
rootCoord: address: localhost port: 53100
grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32, Maximum data size received by the server serverMaxSendSize: 2147483647 # math.MaxInt32, Maximum data size sent by the server clientMaxRecvSize: 104857600 # 100 MB, Maximum data size received by the client clientMaxSendSize: 104857600 # 100 MB, Maximum data size sent by the client
dmlChannelNum: 256 # The number of dml channels created at system startup maxPartitionNum: 4096 # Maximum number of partitions in a collection minSegmentSizeToEnableIndex: 1024 # It's a threshold. When the segment size is less than this value, the segment will not be indexed
Related configuration of proxy, used to validate client requests and reduce the returned results.
proxy: port: 19530
grpc: serverMaxRecvSize: 536870912 # 512 MB, 512 1024 1024 Bytes serverMaxSendSize: 536870912 # 512 MB, 512 1024 1024 Bytes clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024
timeTickInterval: 200 # ms, the interval that proxy synchronize the time tick msgStream: timeTick: bufSize: 512 maxNameLength: 255 # Maximum length of name for a collection or alias maxFieldNum: 256 # Maximum number of fields in a collection maxDimension: 32768 # Maximum dimension of a vector maxShardNum: 256 # Maximum number of shards in a collection maxTaskNum: 1024 # max task number of proxy task queue bufFlagExpireTime: 3600 # second, the time to expire bufFlag from cache in collectResultLoop bufFlagCleanupInterval: 600 # second, the interval to clean bufFlag cache in collectResultLoop
Related configuration of queryCoord, used to manage topology and load balancing for the query nodes, and handoff from growing segments to sealed segments.
queryCoord: address: localhost port: 19531 autoHandoff: true # Enable auto handoff autoBalance: true # Enable auto balance overloadedMemoryThresholdPercentage: 90 # The threshold percentage that memory overload balanceIntervalSeconds: 60 memoryUsageMaxDifferencePercentage: 30
grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024
Related configuration of queryNode, used to run hybrid search between vector and scalar data.
queryNode: cacheSize: 32 # GB, default 32 GB,
cacheSize
is the memory used for caching data for faster query. ThecacheSize
must be less than system memory size. gracefulTime: 0 # Minimum time before the newly inserted data can be searched (in ms) port: 21123grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024
stats: publishInterval: 1000 # Interval for querynode to report node information (milliseconds) dataSync: flowGraph: maxQueueLength: 1024 # Maximum length of task queue in flowgraph maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph msgStream: search: recvBufSize: 512 # msgPack channel buffer size pulsarBufSize: 512 # pulsar channel buffer size searchResult: recvBufSize: 64 # msgPack channel buffer size
Segcore will divide a segment into multiple chunks.
segcore: chunkRows: 32768 # The number of vectors in a chunk.
indexCoord: address: localhost port: 31000
grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024
indexNode: port: 21121
grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024
dataCoord: address: localhost port: 13333
grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024 enableCompaction: true # Enable data segment compression enableGarbageCollection: false
segment: maxSize: 512 # Maximum size of a segment in MB sealProportion: 0.75 # It's the minimum proportion for a segment which can be sealed assignmentExpiration: 2000 # The time of the assignment expiration in ms
compaction: enableAutoCompaction: true
gc: interval: 3600 # gc interval in seconds missingTolerance: 86400 # file meta missing tolerance duration in seconds, 6024 dropTolerance: 86400 # file belongs to dropped entity tolerance duration in seconds, 6024
dataNode: port: 21124
grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024
dataSync: flowGraph: maxQueueLength: 1024 # Maximum length of task queue in flowgraph maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph flush:
Max buffer size to flush for a single segment.
Configure whether to store the vector and the local path when querying/searching in Querynode.
localStorage: path: /var/lib/milvus/data/ enabled: true
Configures the system log output.
log: level: debug # info, warn, error, panic, fatal file: rootPath: "" # default to stdout, stderr maxSize: 300 # MB maxAge: 10 # Maximum time for log retention in day. maxBackups: 20 format: json # text/json
msgChannel:
Channel name generation rule: ${namePrefix}-${ChannelIdx}
chanNamePrefix: cluster: "by-dev" rootCoordTimeTick: "rootcoord-timetick" rootCoordStatistics: "rootcoord-statistics" rootCoordDml: "rootcoord-dml" rootCoordDelta: "rootcoord-delta" search: "search" searchResult: "searchResult" proxyTimeTick: "proxyTimeTick" queryTimeTick: "queryTimeTick" queryNodeStats: "query-node-stats"
Cmd for loadIndex, flush, etc...
Sub name generation rule: ${subNamePrefix}-${NodeID}
subNamePrefix: rootCoordSubNamePrefix: "rootCoord" proxySubNamePrefix: "proxy" queryNodeSubNamePrefix: "queryNode" dataNodeSubNamePrefix: "dataNode" dataCoordSubNamePrefix: "dataCoord"
common: defaultPartitionName: "_default" # default partition name for a collection defaultIndexName: "_default_idx" # default index name retentionDuration: 432000 # 5 days in seconds
knowhere:
Default value: auto
Valid values: [auto, avx512, avx2, avx, sse4_2]
This configuration is only used by querynode and indexnode, it selects CPU instruction set for Searching and Index-building.
再次感谢!
Suggestion
No response
Anything else?
No response