Open mindaugasrukas opened 1 year ago
The same issues have been reported for version temporal version 0.5.0 (server 1.20.0) (ui 2.10.3)
:
{"level":"error","ts":"2023-02-22T08:56:41.887-0800","msg":"Operation failed with internal error.","error":"ListNamespaces operation failed. Failed to get namespace rows. Error: SQL logic error: no such table: namespaces (1)","operation":"ListNamespaces","logging-call-at":"persistenceMetricClients.go:1171","stacktrace":"go.temporal.io/server/common/log.(zapLogger).Error\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/log/zap_logger.go:150\ngo.temporal.io/server/common/persistence.updateErrorMetric\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/persistence/persistenceMetricClients.go:1171\ngo.temporal.io/server/common/persistence.(metricEmitter).recordRequestMetrics\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/persistence/persistenceMetricClients.go:1148\ngo.temporal.io/server/common/persistence.(metadataPersistenceClient).ListNamespaces.func1\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/persistence/persistenceMetricClients.go:683\ngo.temporal.io/server/common/persistence.(metadataPersistenceClient).ListNamespaces\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/persistence/persistenceMetricClients.go:685\ngo.temporal.io/server/common/persistence.(metadataRetryablePersistenceClient).ListNamespaces.func1\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/persistence/persistenceRetryableClients.go:887\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/backoff/retry.go:199\ngo.temporal.io/server/common/persistence.(metadataRetryablePersistenceClient).ListNamespaces\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/persistence/persistenceRetryableClients.go:891\ngo.temporal.io/server/common/namespace.(registry).refreshNamespaces\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/namespace/registry.go:386\ngo.temporal.io/server/common/namespace.(registry).refreshLoop\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/namespace/registry.go:357\ngo.temporal.io/server/internal/goro.(Handle).Go.func1\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/internal/goro/goro.go:64"}
{"level":"error","ts":"2023-02-22T08:56:41.892-0800","msg":"Operation failed with internal error.","error":"GetTaskQueue operation failed. Failed to check if task queue default-worker-tq of type Workflow existed. Error: SQL logic error: no such table: task_queues (1)","operation":"GetTaskQueue","logging-call-at":"persistenceMetricClients.go:1171","stacktrace":"go.temporal.io/server/common/log.(zapLogger).Error\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/log/zap_logger.go:150\ngo.temporal.io/server/common/persistence.updateErrorMetric\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/persistence/persistenceMetricClients.go:1171\ngo.temporal.io/server/common/persistence.(metricEmitter).recordRequestMetrics\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/persistence/persistenceMetricClients.go:1148\ngo.temporal.io/server/common/persistence.(taskPersistenceClient).GetTaskQueue.func1\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/persistence/persistenceMetricClients.go:567\ngo.temporal.io/server/common/persistence.(taskPersistenceClient).GetTaskQueue\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/persistence/persistenceMetricClients.go:569\ngo.temporal.io/server/service/matching.(taskQueueDB).takeOverTaskQueueLocked\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/service/matching/db.go:123\ngo.temporal.io/server/service/matching.(taskQueueDB).RenewLease\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/service/matching/db.go:109\ngo.temporal.io/server/service/matching.(taskWriter).renewLeaseWithRetry.func1\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/service/matching/taskWriter.go:302\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/common/backoff/retry.go:199\ngo.temporal.io/server/service/matching.(taskWriter).renewLeaseWithRetry\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/service/matching/taskWriter.go:306\ngo.temporal.io/server/service/matching.(taskWriter).initReadWriteState\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/service/matching/taskWriter.go:131\ngo.temporal.io/server/service/matching.(taskWriter).taskWriterLoop\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/service/matching/taskWriter.go:221\ngo.temporal.io/server/internal/goro.(Handle).Go.func1\n\tgo.temporal.io/server@v1.18.1-0.20230207023301-52c3a9eefb06/internal/goro/goro.go:64"}
Posting some context from @yiminc:
Note that if the last database connection in the pool closes, the in-memory database is deleted. Make sure the max idle connection limit is > 0, and the connection lifetime is infinite.
I have a similar issue:
error while fetching cluster metadata: operation GetClusterMetadata encountered table cluster_metadata_info does not exist
Linking for visibility: https://github.com/temporalio/cli/issues/124
I've been observing multiple flakes of this error message in TS SDK's integration tests recently. To be exact, 11 times in the last 3 weeks, vs none before that (as far as I can see in GHA logs).
In the context of those CI jobs, it only happens with the CLI Dev Server started at the GHA job level (i.e. not with Dev Server instances started using the SDK's built-in TestWorkflowEnvironment
), using CLI 0.12.0 and 0.13.2. Interestingly, 9 times out of 11, the "error" started at almost the same place during the tests, in "Worker Lifecycle" tests.
I have modified the CI workflow to retain the server's logs on failure. Hopefully, I may be able to provide more data on this soon.
Yesterday I launched the single process for development:
temporal server start-dev
. Usually, I keep that running for a couple of days without any issues. But today, I got this HTTP 503 response on the web UI:So I had to restart the process. I'm still trying to figure out how to reproduce or if this is a real issue, so I'm leaving this here for a record in case that repeats or we can better understand the problem.
Some log snippets:
Expected Behavior
No issues.
Actual Behavior
A single process failed due to a missing DB table.
Steps to Reproduce the Problem
Unknown. I was not able to construct reproducible steps.
What I did initially:
% temporal server start-dev
Specifications