temporalio / temporal

Temporal service
https://docs.temporal.io
MIT License
12.13k stars 850 forks source link

Archival does not show list of archived workflows #5624

Open jmichaelsmotus opened 8 months ago

jmichaelsmotus commented 8 months ago

Expected Behavior

Viewing the Archival tab for a namespace which has archived workflows should display them.

Actual Behavior

The Archival tab is displaying the message "No Workflows running in this Namespace" even though there are archived workflows present in the configured archival S3 bucket. The GET call to api/v1/namespaces/<namespace>/archived-workflows?query= is failing with an HTTP 400.

Steps to Reproduce the Problem

  1. Create an S3 bucket for Archival.
  2. Ensure that Temporal's K8s service account has the below IAM permissions, changing the bucket name here to match.
    {
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:GetObject",
                "s3:GetObjectAcl",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::temporal-workflow-archive-bucket",
                "arn:aws:s3:::temporal-workflow-archive-bucket/*"
            ]
        }
    ]
    }
  3. Add the below configuration to the Temporal Helm chart, changing the bucket name here to match.

    config:
    archival:
      history:
        state: enabled
        enableRead: true
        provider:
          s3store:
            region: <aws region>
      visibility:
        state: enabled
        enableRead: true
        provider: 
          s3store:
            region: <aws region>
    
    namespaceDefaults:
      archival:
        history:
          state: enabled
          URI: s3://temporal-workflow-archive-bucket
        visibility:
          state: enabled
          URI: s3://temporal-workflow-archive-bucket
  4. Turn on archival and visibility for a namespace that has workflows running, and wait for the retention period.
  5. View the archival bucket and observe that there are files present for those archived workflows.
  6. Go to the Archival tab for that namespace and observe the message "No Workflows running in this Namespace" and the failed call to api/v1/namespaces/<namespace>/archived-workflows?query=.

Specifications

jmichaelsmotus commented 8 months ago

Some more information about this issue:

The call to api/v1/namespaces/banking/archived-workflows?query= which returns an HTTP 400 is giving the following error:

{
  "code": 3,
  "message": "Cluster is not configured for reading archived visibility records."
}

This error message is defined here:

errClusterIsNotConfiguredForReadingArchivalVisibility = serviceerror.NewInvalidArgument("Cluster is not configured for reading archived visibility records.")

https://github.com/temporalio/temporal/blob/main/service/frontend/errors.go#L62

And seems to be only used here:

if !wh.archivalMetadata.GetVisibilityConfig().ReadEnabled() {
    return nil, errClusterIsNotConfiguredForReadingArchivalVisibility
}

https://github.com/temporalio/temporal/blob/main/service/frontend/workflow_handler.go#L2131

It looks like ultimately the value of wh.archivalMetadata.GetVisibilityConfig().ReadEnabled() comes from the configuration loaded from the config yaml, since cfg.Archival.Visibility.EnableRead maps to the yaml key path archival.visibility.enableRead:

func ArchivalMetadataProvider(dc *dynamicconfig.Collection, cfg *config.Config) archiver.ArchivalMetadata {
    return archiver.NewArchivalMetadata(
        dc,
        cfg.Archival.History.State,
        cfg.Archival.History.EnableRead,
        cfg.Archival.Visibility.State,
        cfg.Archival.Visibility.EnableRead,
        &cfg.NamespaceDefaults.Archival,
    )
}

https://github.com/temporalio/temporal/blob/main/common/resource/fx.go#L350

However, I can see from the temporal-frontend Pod's logs that it's loading config/docker.yaml:

2024/03/29 15:52:04 Loading config; env=docker,zone=,configDir=config
2024/03/29 15:52:04 Loading config files=[config/docker.yaml]

And if I shell into the container, I can see the config file does have the correct configuration:

temporal-frontend-f56454dff-7dr26:/etc/temporal$ cat config/docker.yaml
<...>
archival:
  history:
    state: "enabled"
    enableRead: true
    provider:
      s3store:
        logLevel: 0
        region: #######
  visibility:
    state: "enabled"
    enabledRead: true
    provider:
      s3store:
        logLevel: 0
        region: #######

namespaceDefaults:
  archival:
    history:
      URI: #######
      state: enabled
    visibility:
      URI: #######
      state: enabled
<...>

So why is temporal-frontend acting like this is not configured?

gad26032 commented 7 months ago

Same issue. But i use local file system as a backend.

  archival:
    history:
      state: "enabled"
      enableRead: true
      provider:
        filestore:
          fileMode: "0666"
          dirMode: "0766"
    visibility:
      state: "enabled"
      enableRead: true
      provider:
        filestore:
          fileMode: "0666"
          dirMode: "0766"

  namespaceDefaults:
    archival:
      history:
        state: "enabled"
        URI: "file:///home/temporal/temporal_archival/data"
      visibility:
        state: "enabled"
        URI: "file:///home/temporal/temporal_vis_archival/data"

describe namespace

temporal-admintools-94844b8c5-v5957:/etc/temporal$ temporal operator namespace describe dev
  NamespaceInfo.Name                    dev                                               
  NamespaceInfo.Id                      c975b631-f109-41f7-865a-04b341fbe922              
  NamespaceInfo.Description                                                               
  NamespaceInfo.OwnerEmail                                                                
  NamespaceInfo.State                   Registered                                        
  NamespaceInfo.Data                    map[]                                             
  Config.WorkflowExecutionRetentionTtl  24h0m0s                                           
  ReplicationConfig.ActiveClusterName   active                                            
  ReplicationConfig.Clusters            [&ClusterReplicationConfig{ClusterName:active,}]  
  Config.HistoryArchivalState           Enabled                                           
  Config.VisibilityArchivalState        Enabled                                           
  IsGlobalNamespace                     false                                             
  FailoverVersion                                                                      0  
  FailoverHistory                       []                                                

on temporal-history pod

temporal-history-6c4ffd4497-gdjzn:~$ tree 
.
├── temporal_archival
│   └── data
│       ├── 9432711404182916402104093526875229053638092447613532531101_0.history
│       ├── 94327114041829164021084163317621728861618148438391857267770_0.history
│       ├── 94327114041829164021091822365132548957316345918779004043580_0.history
...
│       ├── 943271140418291640294944933450628752312303702040985760010_0.history
│       ├── 943271140418291640295710937541081983910480885637690598829_0.history
│       └── 9432711404182916402970473396858808043215067898247463723775_0.history
└── temporal_vis_archival
    └── data
        └── c975b631-f109-41f7-865a-04b341fbe922
            ├── 1714523543567043851_3107041130708393426.visibility
            ├── 1714523665551876245_8603905498755678542.visibility
            ├── 1714523666297386886_281047780464377597.visibility
...
            ├── 1714526336198869418_5221394463637855399.visibility
            ├── 1714527023940092587_4563104817072754649.visibility
            ├── 1714527023998615275_17488082703656383857.visibility
            └── 1714527261355949256_15312822523284306755.visibility

5 directories, 64 files

But when i try to get history, i get this

temporal-admintools-94844b8c5-v5957:/etc/temporal$ temporal workflow list --archived dev
Error: unable to list archived workflow executions: Namespace is not configured for visibility archival.
('export TEMPORAL_CLI_SHOW_STACKS=1' to see stack traces)

Web UI just doesn't show anything on the Archived page

UPD 2024-04-02:

I've decide to redeploy temporal from scratch. After that logs don't show any errors but history is still not sows on Archive page. Request from admin-tools container now works without errors but shows nothing

temporal-admintools-94844b8c5-pptl4:/etc/temporal$ temporal workflow list --archived dev
temporal-admintools-94844b8c5-pptl4:/etc/temporal$ 

History container do have files with history

temporal-history-6c4ffd4497-7jxfk:~$ tree
.
├── temporal_archival
│   └── data
│       ├── 3153029508883409284100556632954027255031624317270243527795_0.history
│       ├── 315302950888340928410077844079520042907189990798703046629_0.history
│       ├── 31530295088834092841014179372472256450213867944691256176036_0.history
...      ...
│       ├── 315302950888340928494255717878218597641732674283183740556_0.history
│       ├── 31530295088834092849742622859503433869401773985681693035_0.history
│       └── 3153029508883409284992368164740503790016816529257862930874_0.history
└── temporal_vis_archival
    └── data
        └── 71c27c7d-edca-4ed2-b986-1bbcd9aa94c5
            ├── 1714567282922524643_8163139250068973546.visibility
            ├── 1714568230253421245_10586275668407886897.visibility
            ├── 1714568884468711750_1733191977649475147.visibility
             ...
            ├── 1714613295491251227_14703381710616799526.visibility
            ├── 1714613318262705922_1624317270243527795.visibility
            └── 1714613512563957127_16827496660219090821.visibility

5 directories, 74 files
temporal-history-6c4ffd4497-7jxfk:~$ 

Also i've noticed that temporal-system namespace has weird workflows that seems stuck Selection_144

Is it proper behavior or I have some miss-configuration.

Any clues?