milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.51k stars 2.83k forks source link

[Bug]: [Nightly]Milvus install failed in cluster-kafka pipeline #20720

Closed NicoYuan1986 closed 1 year ago

NicoYuan1986 commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:a54b40e
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2):2.2.0.dev72
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Milvus install failed in cluster-kafka pipeline.

script returned exit code 1

Expected Behavior

succeed

Steps To Reproduce

The first time(two days ago): https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/master/193/pipeline/81

The second time(yesterday): https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/master/194/pipeline/81/

Milvus Log

[2022-11-18T18:09:59.536Z] -----------------milvus config --------------------

[2022-11-18T18:09:59.536Z] + kubectl get configmap mdk-194-n-milvus -n milvus-ci -o 'jsonpath={$.data}'

[2022-11-18T18:09:59.537Z] {"milvus.yaml":"# Copyright (C) 2019-2021 Zilliz. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance\n# with the License. You may obtain a copy of the License at\n#\n# [http://www.apache.org/licenses/LICENSE-2.0\n#\n#](http://www.apache.org/licenses/LICENSE-2.0/n#\n#) Unless required by applicable law or agreed to in writing, software distributed under the License\n# is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express\n# or implied. See the License for the specific language governing permissions and limitations under the License.\n\netcd:\n  endpoints:\n    - mdk-194-n-etcd:2379\n  rootPath: by-dev\n  metaSubPath: meta # metaRootPath = rootPath + '/' + metaSubPath\n  kvSubPath: kv # kvRootPath = rootPath + '/' + kvSubPath\n\nmetastore:\n  type: etcd\n\nminio:\n  address: mdk-194-n-minio\n  port: 9000\n  accessKeyID: minioadmin\n  secretAccessKey: minioadmin\n  useSSL: false\n  bucketName: milvus-bucket\n  rootPath: file\n  useIAM: false\n  iamEndpoint: \n\nkafka:\n  brokerList: mdk-194-n-kafka:9092\n\nrootCoord:\n  address: mdk-194-n-milvus-rootcoord\n  port: 53100\n\n  dmlChannelNum: \"256\"  # The number of dml channels created at system startup\n  maxPartitionNum: \"4096\"  # Maximum number of partitions in a collection\n  minSegmentSizeToEnableIndex: \"1024\"  # It's a threshold. When the segment size is less than this value, the segment will not be indexed\n\nproxy:\n  port: 19530\n  internalPort: 19529\n  http:\n    enabled: false # Whether to enable the http server\n    debug_mode: false # Whether to enable http server debug mode\n\n  timeTickInterval: \"200\"  # ms, the interval that proxy synchronize the time tick\n  msgStream:\n    timeTick:\n      bufSize: 512\n  maxNameLength: 255  # Maximum length of name for a collection or alias\n  maxFieldNum: \"256\"     # max field number of a collection\n  maxDimension: 32768  # Maximum dimension of a vector\n  maxShardNum: \"256\"  # Maximum number of shards in a collection\n  maxTaskNum: \"1024\"  # max task number of proxy task queue\n\nqueryCoord:\n  address: mdk-194-n-milvus-querycoord\n  port: 19531\n  autoHandoff: true\n  autoBalance: true\n  overloadedMemoryThresholdPercentage: 90\n  balanceIntervalSeconds: 60\n  memoryUsageMaxDifferencePercentage: 30\n  checkInterval: \"1000\"\n  channelTaskTimeout: \"60000\"\n  segmentTaskTimeout: \"120000\"\n  distPullInterval: \"500\"\n  loadTimeoutSeconds: \"600\"\n  checkHandoffInterval: \"5000\"\n  taskMergeCap: \"8\"\n\nqueryNode:\n  port: 21123\n  loadMemoryUsageFactor: 3 # The multiply factor of calculating the memory usage while loading segments\n  enableDisk: true # Enable querynode load disk index, and search on disk index\n\n  stats:\n    publishInterval: 1000 # Interval for querynode to report node information (milliseconds)\n  dataSync:\n    flowGraph:\n      maxQueueLength: 1024 # Maximum length of task queue in flowgraph\n      maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph\n  segcore:\n    chunkRows: 1024 # The number of vectors in a chunk.\n    smallIndex:\n      nlist: 128 # small index nlist, recommend to set sqrt(chunkRows), must smaller than chunkRows/8\n      nprobe: 16 # nprobe to search small index, based on your accuracy requirement, must smaller than nlist\n  cache:\n    enabled: true\n    memoryLimit: 2147483648 # 2 GB, 2 * 1024 *1024 *1024\n\n  scheduler:\n    receiveChanSize: 10240\n    unsolvedQueueSize: 10240\n    maxReadConcurrency: 0 # maximum concurrency of read task. if set to less or equal 0, it means no uppper limit.\n    cpuRatio: 10.0 # ratio used to estimate read task cpu usage.\n\n  grouping:\n    enabled: true\n    maxNQ: \"1000\"\n    topKMergeRatio: 10.0\n\n\nindexCoord:\n  address: mdk-194-n-milvus-indexcoord\n  port: 31000\n\n  gc:\n    interval: 1  # gc interval in seconds\n\nindexNode:\n  port: 21121\n  enableDisk: true # Enable index node build disk vector index\n\n  scheduler:\n    buildParallel: 1 # one index node can run how many index tasks in parallel\n\n\ndataCoord:\n  address: mdk-194-n-milvus-datacoord\n  port: 13333\n\n  enableCompaction: true\n  enableGarbageCollection: true\n\n  segment:\n    maxSize: \"512\"  # Maximum size of a segment in MB\n    diskSegmentMaxSize: \"2048\" # Maximum size of segment in MB for disk index collection\n    sealProportion: \"0.25\" # It's the minimum proportion for a segment which can be sealed\n    assignmentExpiration: 2000 # The time of the assignment expiration in ms\n    maxLife: \"3600\" # The max lifetime of segment in seconds, 60*60\n    maxIdleTime: \"300\" # The maximum idle time of a growing segment in seconds, 5*60\n    minSizeFromIdleToSealed: \"16\"  # The minimum size in MB of segment which can be idle from sealed\n    smallProportion: \"0.9\" # The proportion for a sealed segment, which would not be compacted\n\n  compaction:\n    enableAutoCompaction: true\n\n  gc:\n    interval: 60 # gc interval in seconds\n    missingTolerance: 86400 # file meta missing tolerance duration in seconds, 1 day\n    dropTolerance: 86400 # file belongs to dropped entity tolerance duration in seconds, 1 day\n\ndataNode:\n  port: 21124\n\n  dataSync:\n    flowGraph:\n      maxQueueLength: 1024  # Maximum length of task queue in flowgraph\n      maxParallelism: 1024  # Maximum number of tasks executed in parallel in the flowgraph\n  flush:\n    insertBufSize: \"16777216\"  # Bytes, 16 MB\n\nlog:\n  level: debug\n  file:\n    rootPath: \"\"\n    maxSize: 300\n    maxAge: 10\n    maxBackups: 20\n  format: text\n\ngrpc:\n  log:\n    level: WARNING\n\n  serverMaxRecvSize: 2147483647 # math.MaxInt32\n  serverMaxSendSize: 2147483647 # math.MaxInt32\n  clientMaxRecvSize: 104857600 # 100 MB, 100 * 1024 * 1024\n  clientMaxSendSize: 104857600 # 100 MB, 100 * 1024 * 1024\n\n  client:\n    dialTimeout: 5000\n    keepAliveTime: 10000\n    keepAliveTimeout: 20000\n    maxMaxAttempts: 5\n    initialBackOff: 1.0\n    maxBackoff: 60.0\n    backoffMultiplier: 2.0\n\ncommon:\n  # Channel name generation rule: ${namePrefix}-${ChannelIdx}\n  chanNamePrefix:\n    cluster: by-dev\n    rootCoordTimeTick: \"rootcoord-timetick\"\n    rootCoordStatistics: \"rootcoord-statistics\"\n    rootCoordDml: \"rootcoord-dml\"\n    rootCoordDelta: \"rootcoord-delta\"\n    search: \"search\"\n    searchResult: \"searchResult\"\n    queryTimeTick: \"queryTimeTick\"\n    queryNodeStats: \"query-node-stats\"\n    # Cmd for loadIndex, flush, etc...\n    cmd: \"cmd\"\n    dataCoordStatistic: \"datacoord-statistics-channel\"\n    dataCoordTimeTick: \"datacoord-timetick-channel\"\n    dataCoordSegmentInfo: \"segment-info-channel\"\n\n  # Sub name generation rule: ${subNamePrefix}-${NodeID}\n  subNamePrefix:\n    rootCoordSubNamePrefix: \"rootCoord\"\n    proxySubNamePrefix: \"proxy\"\n    queryNodeSubNamePrefix: \"queryNode\"\n    dataNodeSubNamePrefix: \"dataNode\"\n    dataCoordSubNamePrefix: \"dataCoord\"\n\n  defaultPartitionName: \"_default\"  # default partition name for a collection\n  defaultIndexName: \"_default_idx\"  # default index name\n  retentionDuration: 0\n  entityExpiration:  -1     # Entity expiration in seconds, CAUTION make sure entityExpiration \u003e= retentionDuration and -1 means never expire\n\n  gracefulTime: 5000 # milliseconds. it represents the interval (in ms) by which the request arrival time needs to be subtracted in the case of Bounded Consistency.\n  security:\n    authorizationEnabled: false\n  simdType: auto  # default to auto\n  indexSliceSize: 16 # MB\n\n  storageType: minio\n  mem_purge_ratio: 0.2 # in Linux os, if memory-fragmentation-size \u003e= used-memory * ${mem_purge_ratio}, then do `malloc_trim`\n\nquotaAndLimits:\n  enabled: true\n  quotaCenterCollectInterval: 3  # seconds\n\n  ddl:\n    enabled: false\n    collectionRate: -1\n    partitionRate: -1\n\n  indexRate:\n    enabled: false\n    max: -1\n  flushRate:\n    enabled: false\n    max: -1\n  compactionRate:\n    enabled: false\n    max: -1\n\n  dml:\n    enabled: false\n    insertRate:\n      max: -1\n    deleteRate:\n      max: -1\n    bulkLoadRate:\n      max: -1\n\n  dql:\n    enabled: false\n    searchRate:\n      max: -1\n    queryRate:\n      max: -1\n\n  limitWriting:\n    forceDeny: false\n\n    ttProtection:\n      enabled: true\n      maxTimeTickDelay: 300  # seconds\n\n    memProtection:\n      enabled: true\n      dataNodeMemoryLowWaterLevel: 0.85\n      dataNodeMemoryHighWaterLevel: 0.95\n      queryNodeMemoryLowWaterLevel: 0.85\n      queryNodeMemoryHighWaterLevel: 0.95\n    diskProtection:\n      enabled: true\n      diskQuota: -1\n\n  limitReading:\n    forceDeny: false\n\n    queueProtection:\n      enabled: false\n      nqInQueueThreshold: -1\n      queueLatencyThreshold: -1\n\n    resultProtection:\n      enabled: false\n      maxReadResultRate: -1\n\n    coolOffSpeed: 0.9\n"}+ exit 1

script returned exit code 1

Anything else?

No response

yanliang567 commented 1 year ago

/assign @Bennu-Li please help to take a look.

/unassign

Bennu-Li commented 1 year ago

/assign @binbinlv

jaime0815 commented 1 year ago

/assign @jaime0815