portainer / portainer

Making Docker and Kubernetes management easy.
https://www.portainer.io
zlib License
31.01k stars 2.48k forks source link

Nomad dashboard: Unable to list Nomad jobs: Unable to list allocations #8801

Open soupdiver opened 1 year ago

soupdiver commented 1 year ago

Bug description I setup an Edge Agent to connect to my Nomad cluster. Setup looks fine. I can see the Nomad cluster under environments and can open its dashboard. But when clicking "Nomad Jobs" I get errors from the api.

{"message":"Unable to list allocations","details":"failed to get the latest deployment for job home-heimdall in namespace default"}

It complains one service after another. Once I stop and purge the next one throws an error.

Expected behavior No errors.

Portainer Logs the only errors in the logs are but I think they unrelated

2023/04/17 06:10PM ERR github.com/portainer/portainer-ee/api/internal/endpointutils/endpointutils.go:173 > error while detecting storage classes | error="unsupported environment type" stack_trace=[{"func":"(*ClientFactory).CreateClient","line":"158","source":"client.go"},{"func":"(*ClientFactory).createCachedAdminKubeClient","line":"133","source":"client.go"},{"func":"(*ClientFactory).GetKubeClient","line":"78","source":"client.go"},{"func":"storageDetect","line":"138","source":"endpointutils.go"},{"func":"InitialStorageDetection","line":"169","source":"endpointutils.go"},{"func":"(*Handler).endpointInspect","line":"66","source":"endpoint_inspect.go"},{"func":"LoggerHandler.ServeHTTP","line":"23","source":"error.go"},{"func":"LogUserActivityWithContext.func1.1","line":"37","source":"useractivity.go"},{"func":"HandlerFunc.ServeHTTP","line":"2109","source":"server.go"},{"func":"(*RequestBouncer).mwUpgradeToRestrictedRequest.func1","line":"327","source":"bouncer.go"},{"func":"HandlerFunc.ServeHTTP","line":"2109","source":"server.go"},{"func":"(*RequestBouncer).mwCheckPortainerAuthorizations.func1","line":"282","source":"bouncer.go"},{"func":"HandlerFunc.ServeHTTP","line":"2109","source":"server.go"},{"func":"(*RequestBouncer).mwCheckLicense.func1","line":"254","source":"bouncer.go"},{"func":"HandlerFunc.ServeHTTP","line":"2109","source":"server.go"},{"func":"(*RequestBouncer).mwAuthenticateFirst.func1","line":"375","source":"bouncer.go"},{"func":"HandlerFunc.ServeHTTP","line":"2109","source":"server.go"},{"func":"mwSecureHeaders.func1","line":"479","source":"bouncer.go"},{"func":"HandlerFunc.ServeHTTP","line":"2109","source":"server.go"},{"func":"(*Router).ServeHTTP","line":"210","source":"mux.go"},{"func":"StripPrefix.func1","line":"2152","source":"server.go"},{"func":"HandlerFunc.ServeHTTP","line":"2109","source":"server.go"},{"func":"(*Handler).ServeHTTP","line":"223","source":"handler.go"},{"func":"(*OfflineGate).WaitingMiddleware.func1","line":"39","source":"offlinegate.go"},{"func":"HandlerFunc.ServeHTTP","line":"2109","source":"server.go"},{"func":"(*Monitor).WithRedirect.func1","line":"117","source":"admin_monitor.go"},{"func":"HandlerFunc.ServeHTTP","line":"2109","source":"server.go"},{"func":"serverHandler.ServeHTTP","line":"2947","source":"server.go"},{"func":"(*conn).serve","line":"1991","source":"server.go"},{"func":"goexit","line":"1594","source":"asm_amd64.s"}]
2023/04/17 06:10PM INF github.com/portainer/portainer-ee/api/internal/endpointutils/endpointutils.go:176 > retrying storage detection in 30 seconds |

Steps to reproduce the issue:

  1. Connect Nomad cluster
  2. Try to open its Dashboard in Portainer

Technical details:

Additional context Add any other context about the problem here.

tamarahenson commented 1 year ago

@soupdiver

Thank you for the information. Can you provide the following:

[1] nomad job status <job name> [2] Resource code for your job for testing on my end

Thanks!

soupdiver commented 1 year ago

1

nomad job status home-heimdall
ID            = home-heimdall
Name          = home-heimdall
Submit Date   = 2023-04-07T21:27:47+02:00
Type          = service
Priority      = 50
Datacenters   = home
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost  Unknown
heimdall    0       0         1        1       17        2     0

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
5abcf919  41503e45  heimdall    3        run      running  4d10h ago  1d16h ago

2.

{
  "Stop": false,
  "Region": "global",
  "Namespace": "default",
  "ID": "home-heimdall",
  "ParentID": "",
  "Name": "home-heimdall",
  "Type": "service",
  "Priority": 50,
  "AllAtOnce": false,
  "Datacenters": [
    "home"
  ],
  "Constraints": null,
  "Affinities": null,
  "Spreads": null,
  "TaskGroups": [
    {
      "Name": "heimdall",
      "Count": 1,
      "Update": {
        "Stagger": 30000000000,
        "MaxParallel": 1,
        "HealthCheck": "checks",
        "MinHealthyTime": 10000000000,
        "HealthyDeadline": 300000000000,
        "ProgressDeadline": 600000000000,
        "AutoRevert": false,
        "AutoPromote": false,
        "Canary": 0
      },
      "Migrate": {
        "MaxParallel": 1,
        "HealthCheck": "checks",
        "MinHealthyTime": 10000000000,
        "HealthyDeadline": 300000000000
      },
      "Constraints": [
        {
          "LTarget": "${attr.consul.version}",
          "RTarget": ">= 1.7.0",
          "Operand": "semver"
        }
      ],
      "Scaling": null,
      "RestartPolicy": {
        "Attempts": 2,
        "Interval": 1800000000000,
        "Delay": 15000000000,
        "Mode": "fail"
      },
      "Tasks": [
        {
          "Name": "heimdall",
          "Driver": "docker",
          "User": "",
          "Config": {
            "force_pull": true,
            "volumes": [
              "heimdall:/config"
            ],
            "volume_driver": "local",
            "image": "linuxserver/heimdall",
            "ports": [
              "http"
            ]
          },
          "Env": {
            "PUID": "1000",
            "PGID": "1000",
            "TZ": "Europe/Berlin"
          },
          "Services": [
            {
              "Name": "home-heimdall-heimdall-heimdall",
              "TaskName": "heimdall",
              "PortLabel": "http",
              "AddressMode": "auto",
              "Address": "",
              "EnableTagOverride": false,
              "Tags": [
                "traefik.enable=true",
                "traefik.http.routers.heimdall.entryPoints=web",
                "traefik.http.routers.heimdall.rule=Host(`home-heimdall-heimdall-heimdall.service.consul`)",
                "dc=home"
              ],
              "CanaryTags": null,
              "Checks": null,
              "Connect": null,
              "Meta": null,
              "CanaryMeta": null,
              "TaggedAddresses": null,
              "Namespace": "default",
              "OnUpdate": "require_healthy",
              "Provider": "consul"
            }
          ],
          "Vault": null,
          "Templates": null,
          "Constraints": null,
          "Affinities": null,
          "Resources": {
            "CPU": 200,
            "Cores": 0,
            "MemoryMB": 512,
            "MemoryMaxMB": 0,
            "DiskMB": 0,
            "IOPS": 0,
            "Networks": null,
            "Devices": null
          },
          "RestartPolicy": {
            "Attempts": 2,
            "Interval": 1800000000000,
            "Delay": 15000000000,
            "Mode": "fail"
          },
          "DispatchPayload": null,
          "Lifecycle": null,
          "Meta": null,
          "KillTimeout": 5000000000,
          "LogConfig": {
            "MaxFiles": 10,
            "MaxFileSizeMB": 10
          },
          "Artifacts": null,
          "Leader": false,
          "ShutdownDelay": 0,
          "VolumeMounts": null,
          "ScalingPolicies": null,
          "KillSignal": "",
          "Kind": "",
          "CSIPluginConfig": null,
          "Identity": null
        }
      ],
      "EphemeralDisk": {
        "Sticky": false,
        "SizeMB": 300,
        "Migrate": false
      },
      "Meta": null,
      "ReschedulePolicy": {
        "Attempts": 0,
        "Interval": 0,
        "Delay": 30000000000,
        "DelayFunction": "exponential",
        "MaxDelay": 3600000000000,
        "Unlimited": true
      },
      "Affinities": null,
      "Spreads": null,
      "Networks": [
        {
          "Mode": "",
          "Device": "",
          "CIDR": "",
          "IP": "",
          "Hostname": "",
          "MBits": 0,
          "DNS": null,
          "ReservedPorts": [
            {
              "Label": "http",
              "Value": 11000,
              "To": 80,
              "HostNetwork": "default"
            },
            {
              "Label": "https",
              "Value": 1001,
              "To": 443,
              "HostNetwork": "default"
            }
          ],
          "DynamicPorts": null
        }
      ],
      "Consul": {
        "Namespace": ""
      },
      "Services": null,
      "Volumes": null,
      "ShutdownDelay": null,
      "StopAfterClientDisconnect": null,
      "MaxClientDisconnect": null
    }
  ],
  "Update": {
    "Stagger": 30000000000,
    "MaxParallel": 1,
    "HealthCheck": "",
    "MinHealthyTime": 0,
    "HealthyDeadline": 0,
    "ProgressDeadline": 0,
    "AutoRevert": false,
    "AutoPromote": false,
    "Canary": 0
  },
  "Multiregion": null,
  "Periodic": null,
  "ParameterizedJob": null,
  "Dispatched": false,
  "DispatchIdempotencyToken": "",
  "Payload": null,
  "Meta": null,
  "ConsulToken": "",
  "ConsulNamespace": "",
  "VaultToken": "",
  "VaultNamespace": "",
  "NomadTokenID": "",
  "Status": "running",
  "StatusDescription": "",
  "Stable": true,
  "Version": 3,
  "SubmitTime": 1680895667926135300,
  "CreateIndex": 105,
  "ModifyIndex": 164527,
  "JobModifyIndex": 153912
}
tamarahenson commented 1 year ago

@soupdiver

Thank you for the additional information. I believe you may be running into this issue here: https://github.com/portainer/portainer/issues/8369#issuecomment-1404470807

How are you creating your Jobs? The aforementioned are being created via Terraform. I do have an existing internal issue logged. This should be resolved in an upcoming major release.

Thanks!

soupdiver commented 1 year ago

@soupdiver

Thank you for the additional information. I believe you may be running into this issue here: https://github.com/portainer/portainer/issues/8369#issuecomment-1404470807

How are you creating your Jobs? The aforementioned are being created via Terraform. I do have an existing internal issue logged. This should be resolved in an upcoming major release.

Thanks!

My jobs are created by simply pasting the job spec into the web ui of Nomad.

tamarahenson commented 1 year ago

@soupdiver

Thank you for the additional information.

Your process is:
[1] Create Job in Nomad [2] View Nomad Jobs in Portainer

Can you check your Nomad Jobs and see if you have any in a dead state? We have an internal request logged to resolve. If you have a dead Job, Portainer will not display Jobs. The workaround here would be to remove the dead Job from Nomad.

Interim, I can deploy a Nomad instance and test your Job on my end. I will update you as I learn more.

Thanks!

Thanks!

soupdiver commented 1 year ago

Can you check your Nomad Jobs and see if you have any in a dead state?

Nope, the jobs were healthy and running. They also had allocations and everything. The jobs were doing their jobs :)