seaweedfs / seaweedfs-operator

seaweedfs kubernetes operator
Apache License 2.0
174 stars 42 forks source link

Filer PersistentVolumeClaim Config #100

Closed rusty-jules closed 9 months ago

rusty-jules commented 9 months ago

Hi and thank you for this wonderful project!

I was in need of a solution to #93, but didn't find a pull request for it. So here's mine.

This uses the same method as controller_volume_statefulset.go for creating PVCs. It adds the api of PersistentVolumeClaimSpec under the key Persistence of Filer in a style similar to the postgres bitnami helm chart, which lets you specify a volumeClaimTemplate or point the config at an existingClaim which is nice for managing retention policies and such.

The full api was added so we could set defaults for required fields at the kubebuilder level, so that it's more obvious to users since they will appear in the created Seaweed CRD and in the definition if not overridden.

Running make manifests off master bb4ec87e43ec5e6d76b9e9b4a86668bb136c90ac updated more than just the Persistence field in the CRD manifest, so that diff is quite large.

Todo

okgolove commented 5 months ago

Hi @rusty-jules! I've noticed that even if I use persistence storage for filers (leveldb2) after restarts all the components (masters, volumes, filers) I can't see any files in weed shell. Don't we need to have persistence storage for master and volumes data dir as well?

rusty-jules commented 5 months ago

Hi @okgolove!

This PR uses the same method for assigning persistent volumes to filers that the volume stateful set controller was already using - I'm now sure how you can run volume servers without persistent volumes? Here's my yaml of the Seaweed CRD where pvcs are assigned to volume servers:

metadata:
  name: seaweed
  namespace: seaweedfs
spec:
  image: chrislusf/seaweedfs:latest
  master:
    replicas: 3
    volumeSizeLimitMB: 1024
  volume:
    replicas: 2
    requests:
      storage: 50Gi # creates PVCs via volumeClaimTemplate 
  filer:
    replicas: 2
    persistence:
      enabled: true
    config: |-
      [leveldb2]
      enabled = true
      dir = "/data/filerldb2"
  volumeServerDiskCount: 1
  hostSuffix: myhost.com
kind: Seaweed
apiVersion: seaweed.seaweedfs.com/v1

As for master servers, I have no idea! After reading the wiki I had assumed that with HA via RAFT that data would not be lost and I'm not even exactly sure what they store, but maybe @chrislusf can shed some light here.

I have not noticed any dataloss on component restarts when enabling filer persistence, even when entire nodes get auto-scaled/consolidated in my cluster. I have, however, noticed the sometimes I get volume not staged errors when using FUSE mounted volumes, and can't get them to become staged without scaling down the deployment that mounts the volume all the way down and all the way up again. Perhaps this has to do with master server persistence? It seems unlikely to me though because it will only happen for certain deployments bound to a specific PVC at a time (for context, I created multiple PVCs, one for each deployment, but all pointed at the same collection to essentially model an NFS-like ReadWriteMany PVC that can be bound in multiple namespaces).

rusty-jules commented 5 months ago

Here's the error that I see pretty consistently for FUSE mounted PVCs.

FailedMount: MountVolume.SetUp failed for volume "my-pvc-name" : rpc error: code = FailedPrecondition desc = volume hasn't been staged yet

Sometimes all pods in a deployment get this error, and sometimes only some will get it and some with have the PVC mounted. Scaling all pods down has been the only solution. It would be amazing if master server persistence solved this, but I have not been tracking master server restarts to confirm if this is even related.

I use a StorageClass pointed at a seaweedfs collection along with preconfigured PersistentVolumes using that StorageClass and also pointed at the same volumeAttributes.collection and a volumeAttributes.path so all PVs point at the same data. Works great when it works!