uber / cadence

Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.
https://cadenceworkflow.io
MIT License
8.26k stars 796 forks source link

cadence-history service transfer_queue_processor Errors #4723

Open Smithx10 opened 2 years ago

Smithx10 commented 2 years ago

Version of Cadence server, and client(which language)

Server Version: [root@cad-vm-3 ~]# cadence-server -v cadence version Release version: v0.16.0 Build commit: 2022-01-27T20:09:27+02:00-a964dfbd Max Support CLI feature version: 1.7.0 Max Support GoSDK feature version: 1.7.0 Max Support JavaSDK feature version: 1.5.0 Note: Feature version is for compatibility checking between server and clients if enabled feature checking. Server is always backward compatible to older CLI versions, but not accepting newer than it can support.

Bug:

{"level":"error","ts":"2022-02-01T20:39:48.255Z","msg":"Error complete transfer task","service":"cadence-history","shard-id":24,"address":"[::]:7934","component":"transfer-queue-processor","error":"queue processor has been shutdown","logging-call-at":"transfer_queue_processor.go:362","stacktrace":"github.com/uber/cadence/common/log/loggerimpl.(*loggerImpl).Error\n\t/root/git/git/cadence/common/log/loggerimpl/logger.go:131\ngithub.com/uber/cadence/service/history/queue.(*transferQueueProcessor).completeTransferLoop\n\t/root/git/git/cadence/service/history/queue/transfer_queue_processor.go:362"}
{"level":"info","ts":"2022-02-01T20:39:48.256Z","msg":"Timer queue processor state changed","service":"cadence-history","shard-id":24,"address":"[::]:7934","component":"timer-queue-processor","cluster-name":"primary","component":"timer-queue-processor","lifecycle":"Stopping","logging-call-at":"timer_queue_processor_base.go:176"}
{"level":"info","ts":"2022-02-01T20:39:48.256Z","msg":"Task redispatcher stopped.","service":"cadence-history","shard-id":24,"address":"[::]:7934","component":"timer-queue-processor","cluster-name":"primary","component":"timer-queue-processor","lifecycle":"Stopped","logging-call-at":"redispatcher.go:140"}
{"level":"info","ts":"2022-02-01T20:39:48.256Z","msg":"Timer queue processor state changed","service":"cadence-history","shard-id":24,"address":"[::]:7934","component":"timer-queue-processor","cluster-name":"primary","component":"timer-queue-processor","lifecycle":"Stopped","logging-call-at":"timer_queue_processor_base.go:192"}
{"level":"error","ts":"2022-02-01T20:39:48.256Z","msg":"Error complete timer task","service":"cadence-history","shard-id":24,"address":"[::]:7934","component":"timer-queue-processor","error":"queue processor has been shutdown","logging-call-at":"timer_queue_processor.go:356","stacktrace":"github.com/uber/cadence/common/log/loggerimpl.(*loggerImpl).Error\n\t/root/git/git/cadence/common/log/loggerimpl/logger.go:131\ngithub.com/uber/cadence/service/history/queue.(*timerQueueProcessor).completeTimerLoop\n\t/root/git/git/cadence/service/history/queue/timer_queue_processor.go:356"}
{"level":"info","ts":"2022-02-01T20:39:48.256Z","msg":"Marker notifier state changed","service":"cadence-history","shard-id":24,"address":"[::]:7934","component":"failover-marker-notifier","lifecycle":"Stopped","logging-call-at":"marker_notifier.go:98"}
{"level":"info","ts":"2022-02-01T20:39:48.256Z","msg":"History engine state changed","service":"cadence-history","shard-id":24,"address":"[::]:7934","component":"history-engine","lifecycle":"Stopped","logging-call-at":"historyEngine.go:375"}
{"level":"info","ts":"2022-02-01T20:39:48.256Z","msg":"Shard engine state changed","service":"cadence-history","shard-id":24,"address":"[::]:7934","lifecycle":"Stopped","component":"shard-engine","logging-call-at":"controller.go:492"}
{"level":"warn","ts":"2022-02-01T20:39:48.391Z","msg":"Closing shard: updateShardInfoLocked failed due to stolen shard.","service":"cadence-history","shard-id":40,"address":"[::]:7934","shard-id":40,"error":"Failed to update shard. Previous range ID: 78; new range ID: 81","logging-call-at":"context.go:1172"}
{"level":"error","ts":"2022-02-01T20:39:48.391Z","msg":"Error persisting processing queue states","service":"cadence-history","shard-id":40,"address":"[::]:7934","component":"transfer-queue-processor","cluster-name":"primary","component":"transfer-queue-processor","error":"Failed to update shard. Previous range ID: 78; new range ID: 81","operation-result":"OperationFailed","logging-call-at":"processor_base.go:215","stacktrace":"github.com/uber/cadence/common/log/loggerimpl.(*loggerImpl).Error\n\t/root/git/git/cadence/common/log/loggerimpl/logger.go:131\ngithub.com/uber/cadence/service/history/queue.(*processorBase).updateAckLevel\n\t/root/git/git/cadence/service/history/queue/processor_base.go:215\ngithub.com/uber/cadence/service/history/queue.(*transferQueueProcessorBase).processorPump\n\t/root/git/git/cadence/service/history/queue/transfer_queue_processor_base.go:338"}

Config:

[root@cad-vm-1 ~]# cat /etc/cadence/config/base.yaml
log:
    stdout: true
    level: "debug"

persistence:
    numHistoryShards: 100
    defaultStore: default
    visibilityStore: visibility
    datastores:
        default:
            sql:
                pluginName: "postgres"
                encodingType: "thriftrw"
                decodingTypes: ["thriftrw"]
                databaseName: "cadence"
                connectAddr: "cadence-primary.pg.bdf-cloud.iqvia.net:5432"
                connectProtocol: "tcp"
                user: "postgres"
                password: "test123!"
                maxConns: 20
                maxIdleConns: 20
                maxConnLifetime: "1h"
        visibility:
            sql:
                pluginName: "postgres"
                encodingType: "thriftrw"
                decodingTypes: ["thriftrw"]
                databaseName: "cadence_visibility"
                connectAddr: "cadence-primary.pg.bdf-cloud.iqvia.net:5432"
                connectProtocol: "tcp"
                user: "postgres"
                password: "test123!"
                maxConns: 20
                maxIdleConns: 20
                maxConnLifetime: "1h"

ringpop:
    name: cadence
    bootstrapMode: "dns"
    bootstrapHosts:
      - cadence-vm.svc.bdf-cadence.us-east.bdf-cloud.iqvia.net:7833
      - cadence-vm.svc.bdf-cadence.us-east.bdf-cloud.iqvia.net:7933
      - cadence-vm.svc.bdf-cadence.us-east.bdf-cloud.iqvia.net:7834
      - cadence-vm.svc.bdf-cadence.us-east.bdf-cloud.iqvia.net:7934
      - cadence-vm.svc.bdf-cadence.us-east.bdf-cloud.iqvia.net:7835
      - cadence-vm.svc.bdf-cadence.us-east.bdf-cloud.iqvia.net:7935
      - cadence-vm.svc.bdf-cadence.us-east.bdf-cloud.iqvia.net:7939
    maxJoinDuration: 20s

services:
    frontend:
        rpc:
            port: 7933
            grpcPort: 7833
            bindOnIP: 0.0.0.0
    history:
        rpc:
            port: 7934
            grpcPort: 7834
            bindOnIP: 0.0.0.0
    matching:
        rpc:
            port: 7935
            grpcPort: 7835
            bindOnIP: 0.0.0.0
    worker:
        rpc:
            port: 7939
            bindOnIP: 0.0.0.0

clusterGroupMetadata:
    enableGlobalDomain: true
    clusterRedirectionPolicy:
        policy: all-domain-apis-forwarding
    failoverVersionIncrement: 10
    primaryClusterName: "primary"
    currentClusterName: "primary"
    clusterGroup:
        primary:
            enabled: true
            initialFailoverVersion: 0
            rpcName: "cadence-frontend"
            rpcAddress: cadence-vm.svc.bdf-cadence.us-east.bdf-cloud.iqvia.net:7933
            rpcTransport: "grpc"
            authorizationProvider:
                enable: false
                type: "OAuthAuthorization"
                privateKey:

archival:
  history:
    status: disabled
    enableRead: false
    provider:
      filestore:
        fileMode:
        dirMode:
  visibility:
    status: disabled
    enableRead: false
    provider:
      filestore:
        fileMode:
        dirMode:

domainDefaults:
  archival:
    history:
      status: disabled
      URI:
    visibility:
      status: disabled
      URI:

kafka:
    tls:
        enabled: false
    clusters:
        test:
            brokers:
                - :9092
    topics:
        cadence-visibility-dev:
            cluster: test
        cadence-visibility-dev-dlq:
            cluster: test
    applications:
        visibility:
            topic: cadence-visibility-dev
            dlq-topic: cadence-visibility-dev-dlq

publicClient:
    hostPort: cadence-vm.svc.bdf-cadence.us-east.bdf-cloud.iqvia.net:7933

authorization:
    oauthAuthorizer:
        enable: false
        maxJwtTTL: 86400
        jwtCredentials:
            algorithm: "RS256"
            publicKey:
arch@9d46dd2d-1178-ce83-cbc4-d396e4a24060 ~/g/cadence-deployment ❯❯❯ dig cadence-vm.svc.bdf-cadence.us-east.bdf-cloud.iqvia.net +short
10.91.194.142
10.91.194.77
10.91.194.147
10.91.194.4
talha-naeem1 commented 7 months ago

Getting the same error in cadence-history:

{"level":"error","ts":"2024-03-22T12:09:29.036Z","msg":"Error complete transfer task","service":"cadence-history","shard-id":56,"address":"18.0.42.82:7934","component":"transfer-queue-processor","error":"queue processor has been shutdown","logging-call-at":"transfer_queue_processor.go:362","stacktrace":"github.com/uber/cadence/common/log/loggerimpl.(*loggerImpl).Error\n\t/cadence/common/log/loggerimpl/logger.go:131\ngithub.com/uber/cadence/service/history/queue.(*transferQueueProcessor).completeTransferLoop\n\t/cadence/service/history/queue/transfer_queue_processor.go:362"}

Did anyone get solution for this?