seaweedfs / seaweedfs-csi-driver

SeaweedFS CSI Driver https://github.com/seaweedfs/seaweedfs
Apache License 2.0
210 stars 50 forks source link

Panic on startup (nil dereference) running on Nomad #133

Closed Cottand closed 1 year ago

Cottand commented 1 year ago

I am running this as a CSI plugin to Nomad. I followed this example, except

The CSI plugin fails on any Nomad client (any pod) so I think the trace is not specific to the host machine, althoguh all my machines are configured very similarly. Version is latest for the CSI image, 3.55 for filer, volumes, master etc.

Logs:

I0807 23:46:18.075502 driver.go:105 starting
I0807 23:46:18.075881 server.go:94 Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xcbb35e]

goroutine 72 [running]:
github.com/seaweedfs/seaweedfs-csi-driver/pkg/driver.(*ControllerServer).ControllerGetCapabilities(0x0, {0xc000125940?, 0x40da07?}, 0x10?)
    /go/src/github.com/seaweedfs/seaweedfs-csi-driver/pkg/driver/controllerserver.go:179 +0x5e
github.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerGetCapabilities_Handler.func1({0x101cef0, 0xc0003f2ea0}, {0xe3f5e0?, 0xc000446200})
    /go/pkg/mod/github.com/container-storage-interface/spec@v1.8.0/lib/go/csi/csi.pb.go:6546 +0x78
github.com/seaweedfs/seaweedfs-csi-driver/pkg/driver.logGRPC({0x101cef0, 0xc0003f2ea0}, {0xe3f5e0, 0xc000446200}, 0xc000446220, 0xc0000a8318)
    /go/src/github.com/seaweedfs/seaweedfs-csi-driver/pkg/driver/utils.go:64 +0x132
github.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerGetCapabilities_Handler({0xe6fe20?, 0x0}, {0x101cef0, 0xc0003f2ea0}, 0xc0002a4310, 0xf2ef48)
    /go/pkg/mod/github.com/container-storage-interface/spec@v1.8.0/lib/go/csi/csi.pb.go:6548 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000356000, {0x1021760, 0xc0003fa4e0}, 0xc000336360, 0xc00031b410, 0x168cba8, 0x0)
    /go/pkg/mod/google.golang.org/grpc@v1.57.0/server.go:1360 +0xe23
google.golang.org/grpc.(*Server).handleStream(0xc000356000, {0x1021760, 0xc0003fa4e0}, 0xc000336360, 0x0)
    /go/pkg/mod/google.golang.org/grpc@v1.57.0/server.go:1737 +0xa36
google.golang.org/grpc.(*Server).serveStreams.func1.1()
    /go/pkg/mod/google.golang.org/grpc@v1.57.0/server.go:982 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
    /go/pkg/mod/google.golang.org/grpc@v1.57.0/server.go:980 +0x18c

CSI plugin job:

job "seaweedfs-plugin" {
  datacenters = ["dc1"]
  type        = "system"
  update {
    max_parallel = 1
    stagger      = "60s"
  }

  # only one plugin of a given type and ID should be deployed on
  # any given client node
  constraint {
    operator = "distinct_hosts"
    value    = true
  }

  group "nodes" {
    ephemeral_disk {
      migrate = false
      size    = 5000
      sticky  = false
    }
    restart {
      interval = "5m"
      attempts = 10
      delay    = "15s"
      mode     = "delay"
    }
    # does not need to run on a client with seaweed, only needs docker privileged
    task "plugin" {
      driver = "docker"

      template {
        destination = "config/.env"
        change_mode = "restart"
        env         = true
        data        = <<-EOF
{{ range $i, $s := nomadService "seaweedfs-filer-http" }}
{{- if eq $i 0 -}}
SEAWEEDFS_FILER_IP_http={{ .Address }}
SEAWEEDFS_FILER_PORT_http={{ .Port }}
{{- end -}}
{{ end }}
{{ range $i, $s := nomadService "seaweedfs-filer-grpc" }}
{{- if eq $i 0 -}}
SEAWEEDFS_FILER_IP_grpc={{ .Address }}
SEAWEEDFS_FILER_PORT_grpc={{ .Port }}
{{- end -}}
{{ end }}
EOF
      }

      config {
        network_mode = "host"
        image        = "chrislusf/seaweedfs-csi-driver:latest"
        force_pull   = "true"

        args = [
          "--endpoint=unix://csi/csi.sock",
          "--filer=${SEAWEEDFS_FILER_IP_http}:${SEAWEEDFS_FILER_PORT_http}.${SEAWEEDFS_FILER_PORT_grpc}",
          "--nodeid=${node.unique.name}",
          "--cacheCapacityMB=1000",
          "--cacheDir=${NOMAD_TASK_DIR}/cache_dir",
        ]

        privileged = true
      }

      csi_plugin {
        id        = "seaweedfs"
        type      = "monolith"
        mount_dir = "/csi"
      }
      resources {
        cpu        = 100
        memory     = 512
        memory_max = 2048
      }
    }
  }
}

Let me know if I should provide more info.

chrislusf commented 1 year ago

cc @kvaster possibly related to recent PRs? Or the doc needs changes?

kvaster commented 1 year ago

I'm investigating. It's look really strange.

kvaster commented 1 year ago

Yes. It's really related to my changes, I will make one more PR in a 30 minutes. The problem is that I've introduced incompatibility with previous setups. From now you should run either --controller or --node or both of them the same time.

Cottand commented 1 year ago

if this is the result of a breaking change, I would ideally expect

thanks!

kvaster commented 1 year ago

It was not supposed to be a breaking change. I've made a PR which fixes the problem. It was supposed that previous installs would work without any changes.

Cottand commented 1 year ago

I see, no worries then. In that case I would appreciate some docs on what --controller or --node do and other available options

kvaster commented 1 year ago

It was a big refactoring for running driver in kubernetes. Controller server should be running separate of node server. Node server is a daemon which runs on all nodes which can mount seaweedfs and controller should be just fail safe and HA.

kvaster commented 1 year ago

It's all about CSI.

Cottand commented 1 year ago

to achieve the same behaviour as before - can I use both options on all boxes safely? Or will the controllers need to speak to each other/will that increase gossip somehow?

ie, is the example Nomad deployment unchanged (I might need to run a controller separately) or do I have better options now, for HA or performance?

edit - I still get nil dereference when using both options on my existing setup

Cottand commented 1 year ago

@chrislusf you marked as complemeted but in https://github.com/seaweedfs/seaweedfs-csi-driver/pull/134 you did not update the Nomad example (but updated the helm charts) - do the default options for Nomad remain unchanged?

kvaster commented 1 year ago

Yes. Default options remain unchanged now - as it should be.