2.incorrect insert data
set structure:
input parameter:
{"userFace":[[0.80020356, 0.3490076, 0.499359, 0.94444656], [0.23521596, 0.73284143, 0.7519735, 0.8311272]],"userAge":[30,31]}
return results:
insert code:
public R insert(String partitionName,List<List> vectors,List ages) {
System.out.println("========== insert() vector and age ==========");
List fields = new ArrayList<>();
fields.add(new InsertParam.Field("userAge", DataType.Int8, ages));
fields.add(new InsertParam.Field("userFace", DataType.FloatVector, vectors));
InsertParam insertParam = InsertParam.newBuilder()
.withCollectionName("test3")
.withPartitionName(partitionName)
.withFields(fields)
.build();
for (int i=0;i<fields.size();i++){
System.out.println("insert vector and age data: "+fields.get(i).getName()+"="+fields.get(i).getValues());
}
R<MutationResult> response = milvusClient.insert(insertParam);
handleResponseStatus(response);
return response;
}
3.if you import data through the import on the Attu page, it is correct to import userface to test2, userface and userage to test3
through the above findings, if you call the insert API to store the data, the data is wrong when inserting the data userface and userage to test3. Please help me troubleshoot the problem. Thank you!
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Related configuration of etcd, used to store Milvus metadata.
etcd:
endpoints:
localhost:2379
rootPath: by-dev # The root path where data is stored in etcd
metaSubPath: meta # metaRootPath = rootPath + '/' + metaSubPath
kvSubPath: kv # kvRootPath = rootPath + '/' + kvSubPath
Related configuration of minio, which is responsible for data persistence for Milvus.
minio:
address: localhost # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
bucketName: "a-bucket" # Bucket name in MinIO/S3
rootPath: files # The root path where the message is stored in MinIO/S3
Related configuration of pulsar, used to manage Milvus logs of recent mutation operations, output streaming log, and provide log publish-subscribe services.
pulsar:
address: localhost # Address of pulsar
port: 6650 # Port of pulsar
maxMessageSize: 5242880 # 5 1024 1024 Bytes, Maximum size of each message in pulsar.
rocksmq:
path: /var/lib/milvus/rdb_data # The path where the message is stored in rocksmq
rocksmqPageSize: 2147483648 # 2 GB, 2 1024 1024 1024 bytes, The size of each page of messages in rocksmq
retentionTimeInMinutes: 10080 # 7 days, 7 24 60 minutes, The retention time of the message in rocksmq.
retentionSizeInMB: 8192 # 8 GB, 8 1024 MB, The retention size of the message in rocksmq.
Related configuration of rootCoord, used to handle data definition language (DDL) and data control language (DCL) requests
rootCoord:
address: localhost
port: 53100
grpc:
serverMaxRecvSize: 2147483647 # math.MaxInt32, Maximum data size received by the server
serverMaxSendSize: 2147483647 # math.MaxInt32, Maximum data size sent by the server
clientMaxRecvSize: 104857600 # 100 MB, Maximum data size received by the client
clientMaxSendSize: 104857600 # 100 MB, Maximum data size sent by the client
dmlChannelNum: 256 # The number of dml channels created at system startup
maxPartitionNum: 4096 # Maximum number of partitions in a collection
minSegmentSizeToEnableIndex: 1024 # It's a threshold. When the segment size is less than this value, the segment will not be indexed
Related configuration of proxy, used to validate client requests and reduce the returned results.
timeTickInterval: 200 # ms, the interval that proxy synchronize the time tick
msgStream:
timeTick:
bufSize: 512
maxNameLength: 255 # Maximum length of name for a collection or alias
maxFieldNum: 256 # Maximum number of fields in a collection
maxDimension: 32768 # Maximum dimension of a vector
maxShardNum: 256 # Maximum number of shards in a collection
maxTaskNum: 1024 # max task number of proxy task queue
bufFlagExpireTime: 3600 # second, the time to expire bufFlag from cache in collectResultLoop
bufFlagCleanupInterval: 600 # second, the interval to clean bufFlag cache in collectResultLoop
Related configuration of queryCoord, used to manage topology and load balancing for the query nodes, and handoff from growing segments to sealed segments.
queryCoord:
address: localhost
port: 19531
autoHandoff: true # Enable auto handoff
autoBalance: true # Enable auto balance
overloadedMemoryThresholdPercentage: 90 # The threshold percentage that memory overload
balanceIntervalSeconds: 60
memoryUsageMaxDifferencePercentage: 30
Related configuration of queryNode, used to run hybrid search between vector and scalar data.
queryNode:
cacheSize: 32 # GB, default 32 GB, cacheSize is the memory used for caching data for faster query. The cacheSize must be less than system memory size.
gracefulTime: 0 # Minimum time before the newly inserted data can be searched (in ms)
port: 21123
segment:
maxSize: 512 # Maximum size of a segment in MB
sealProportion: 0.75 # It's the minimum proportion for a segment which can be sealed
assignmentExpiration: 2000 # The time of the assignment expiration in ms
compaction:
enableAutoCompaction: true
gc:
interval: 3600 # gc interval in seconds
missingTolerance: 86400 # file meta missing tolerance duration in seconds, 6024
dropTolerance: 86400 # file belongs to dropped entity tolerance duration in seconds, 6024
dataSync:
flowGraph:
maxQueueLength: 1024 # Maximum length of task queue in flowgraph
maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
flush:
Max buffer size to flush for a single segment.
insertBufSize: 16777216 # Bytes, 16 MB
Configure whether to store the vector and the local path when querying/searching in Querynode.
common:
defaultPartitionName: "_default" # default partition name for a collection
defaultIndexName: "_default_idx" # default index name
retentionDuration: 432000 # 5 days in seconds
knowhere:
Default value: auto
Valid values: [auto, avx512, avx2, avx, sse4_2]
This configuration is only used by querynode and indexnode, it selects CPU instruction set for Searching and Index-building.
Is there an existing issue for this?
Issue
1.insert data correctly set structure: input parameter: {"userFace":[[0.80020356, 0.3490076, 0.499359, 0.94444656], [0.23521596, 0.73284143, 0.7519735, 0.8311272]]} return results: insert code: public R insert(String partitionName,List<List> vectors) {
System.out.println("========== insert vector() ==========");
List fields = new ArrayList<>();
fields.add(new InsertParam.Field("userFace", DataType.FloatVector, vectors));
2.incorrect insert data set structure: input parameter: {"userFace":[[0.80020356, 0.3490076, 0.499359, 0.94444656], [0.23521596, 0.73284143, 0.7519735, 0.8311272]],"userAge":[30,31]} return results:
insert code: public R insert(String partitionName,List<List> vectors,List ages) {
System.out.println("========== insert() vector and age ==========");
List fields = new ArrayList<>();
fields.add(new InsertParam.Field("userAge", DataType.Int8, ages));
fields.add(new InsertParam.Field("userFace", DataType.FloatVector, vectors));
3.if you import data through the import on the Attu page, it is correct to import userface to test2, userface and userage to test3
through the above findings, if you call the insert API to store the data, the data is wrong when inserting the data userface and userage to test3. Please help me troubleshoot the problem. Thank you!
pom.xml as follows <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
docker-compose.yaml as follows: version: '3.5'
services: etcd: container_name: milvus-etcd image: quay.io/coreos/etcd:v3.5.0 restart: always
networks:
minio: container_name: milvus-minio image: minio/minio:RELEASE.2022-04-01T03-41-39Z environment:
MINIO_ROOT_USER: minioadmin
standalone: container_name: milvus-standalone image: milvusdb/milvus:v2.0.2 command: ["milvus", "run", "standalone"] environment: ETCD_ENDPOINTS: etcd:2379 MINIO_ADDRESS: minio:9000 PULSAR_ADDRESS: pulsar://pulsar:6650 volumes:
networks: default: name: milvus2.0.2
configs/milvus.yaml as follows
Licensed to the LF AI & Data foundation under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
#
http://www.apache.org/licenses/LICENSE-2.0
#
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Related configuration of etcd, used to store Milvus metadata.
etcd: endpoints:
Related configuration of minio, which is responsible for data persistence for Milvus.
minio: address: localhost # Address of MinIO/S3 port: 9000 # Port of MinIO/S3 accessKeyID: minioadmin # accessKeyID of MinIO/S3 secretAccessKey: minioadmin # MinIO/S3 encryption string useSSL: false # Access to MinIO/S3 with SSL bucketName: "a-bucket" # Bucket name in MinIO/S3 rootPath: files # The root path where the message is stored in MinIO/S3
Related configuration of pulsar, used to manage Milvus logs of recent mutation operations, output streaming log, and provide log publish-subscribe services.
pulsar: address: localhost # Address of pulsar port: 6650 # Port of pulsar maxMessageSize: 5242880 # 5 1024 1024 Bytes, Maximum size of each message in pulsar.
rocksmq: path: /var/lib/milvus/rdb_data # The path where the message is stored in rocksmq rocksmqPageSize: 2147483648 # 2 GB, 2 1024 1024 1024 bytes, The size of each page of messages in rocksmq retentionTimeInMinutes: 10080 # 7 days, 7 24 60 minutes, The retention time of the message in rocksmq. retentionSizeInMB: 8192 # 8 GB, 8 1024 MB, The retention size of the message in rocksmq.
Related configuration of rootCoord, used to handle data definition language (DDL) and data control language (DCL) requests
rootCoord: address: localhost port: 53100
grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32, Maximum data size received by the server serverMaxSendSize: 2147483647 # math.MaxInt32, Maximum data size sent by the server clientMaxRecvSize: 104857600 # 100 MB, Maximum data size received by the client clientMaxSendSize: 104857600 # 100 MB, Maximum data size sent by the client
dmlChannelNum: 256 # The number of dml channels created at system startup maxPartitionNum: 4096 # Maximum number of partitions in a collection minSegmentSizeToEnableIndex: 1024 # It's a threshold. When the segment size is less than this value, the segment will not be indexed
Related configuration of proxy, used to validate client requests and reduce the returned results.
proxy: port: 19530
grpc: serverMaxRecvSize: 536870912 # 512 MB, 512 1024 1024 Bytes serverMaxSendSize: 536870912 # 512 MB, 512 1024 1024 Bytes clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024
timeTickInterval: 200 # ms, the interval that proxy synchronize the time tick msgStream: timeTick: bufSize: 512 maxNameLength: 255 # Maximum length of name for a collection or alias maxFieldNum: 256 # Maximum number of fields in a collection maxDimension: 32768 # Maximum dimension of a vector maxShardNum: 256 # Maximum number of shards in a collection maxTaskNum: 1024 # max task number of proxy task queue bufFlagExpireTime: 3600 # second, the time to expire bufFlag from cache in collectResultLoop bufFlagCleanupInterval: 600 # second, the interval to clean bufFlag cache in collectResultLoop
Related configuration of queryCoord, used to manage topology and load balancing for the query nodes, and handoff from growing segments to sealed segments.
queryCoord: address: localhost port: 19531 autoHandoff: true # Enable auto handoff autoBalance: true # Enable auto balance overloadedMemoryThresholdPercentage: 90 # The threshold percentage that memory overload balanceIntervalSeconds: 60 memoryUsageMaxDifferencePercentage: 30
grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024
Related configuration of queryNode, used to run hybrid search between vector and scalar data.
queryNode: cacheSize: 32 # GB, default 32 GB,
cacheSize
is the memory used for caching data for faster query. ThecacheSize
must be less than system memory size. gracefulTime: 0 # Minimum time before the newly inserted data can be searched (in ms) port: 21123grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024
stats: publishInterval: 1000 # Interval for querynode to report node information (milliseconds) dataSync: flowGraph: maxQueueLength: 1024 # Maximum length of task queue in flowgraph maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph msgStream: search: recvBufSize: 512 # msgPack channel buffer size pulsarBufSize: 512 # pulsar channel buffer size searchResult: recvBufSize: 64 # msgPack channel buffer size
Segcore will divide a segment into multiple chunks.
segcore: chunkRows: 32768 # The number of vectors in a chunk.
indexCoord: address: localhost port: 31000
grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024
indexNode: port: 21121
grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024
dataCoord: address: localhost port: 13333
grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024 enableCompaction: true # Enable data segment compression enableGarbageCollection: false
segment: maxSize: 512 # Maximum size of a segment in MB sealProportion: 0.75 # It's the minimum proportion for a segment which can be sealed assignmentExpiration: 2000 # The time of the assignment expiration in ms
compaction: enableAutoCompaction: true
gc: interval: 3600 # gc interval in seconds missingTolerance: 86400 # file meta missing tolerance duration in seconds, 6024 dropTolerance: 86400 # file belongs to dropped entity tolerance duration in seconds, 6024
dataNode: port: 21124
grpc: serverMaxRecvSize: 2147483647 # math.MaxInt32 serverMaxSendSize: 2147483647 # math.MaxInt32 clientMaxRecvSize: 104857600 # 100 MB, 100 1024 1024 clientMaxSendSize: 104857600 # 100 MB, 100 1024 1024
dataSync: flowGraph: maxQueueLength: 1024 # Maximum length of task queue in flowgraph maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph flush:
Max buffer size to flush for a single segment.
Configure whether to store the vector and the local path when querying/searching in Querynode.
localStorage: path: /var/lib/milvus/data/ enabled: true
Configures the system log output.
log: level: debug # info, warn, error, panic, fatal file: rootPath: "" # default to stdout, stderr maxSize: 300 # MB maxAge: 10 # Maximum time for log retention in day. maxBackups: 20 format: json # text/json
msgChannel:
Channel name generation rule: ${namePrefix}-${ChannelIdx}
chanNamePrefix: cluster: "by-dev" rootCoordTimeTick: "rootcoord-timetick" rootCoordStatistics: "rootcoord-statistics" rootCoordDml: "rootcoord-dml" rootCoordDelta: "rootcoord-delta" search: "search" searchResult: "searchResult" proxyTimeTick: "proxyTimeTick" queryTimeTick: "queryTimeTick" queryNodeStats: "query-node-stats"
Cmd for loadIndex, flush, etc...
Sub name generation rule: ${subNamePrefix}-${NodeID}
subNamePrefix: rootCoordSubNamePrefix: "rootCoord" proxySubNamePrefix: "proxy" queryNodeSubNamePrefix: "queryNode" dataNodeSubNamePrefix: "dataNode" dataCoordSubNamePrefix: "dataCoord"
common: defaultPartitionName: "_default" # default partition name for a collection defaultIndexName: "_default_idx" # default index name retentionDuration: 432000 # 5 days in seconds
knowhere:
Default value: auto
Valid values: [auto, avx512, avx2, avx, sse4_2]
This configuration is only used by querynode and indexnode, it selects CPU instruction set for Searching and Index-building.
thanks again!
Suggestion
No response
Anything else?
No response