nmaupu / freenas-provisioner

Kubernetes external provisioner using Freenas as backend
Apache License 2.0
133 stars 26 forks source link

Broken in FreeNAS 11.3-RELEASE #34

Closed maxirus closed 4 years ago

maxirus commented 4 years ago

Seems FreeNAS v11.3-RELEASE changed the capacity to IEC Suffixes. New PVC yield the error:

I0212 04:58:09.995962       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"test", Name:"test-config", UID:"9f22bf96-256f-405f-adc4-9f6bb5ca373f", APIVersion:"v1", ResourceVersion:"1431038", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "freenas-nfs": Error creating dataset "tank/Kubernetes/test/test-config" - message: {"refquota":["Specify the value with IEC suffixes, e.g. 10 GiB"]}, status: 400

This happens when setting spec.resources.requests.storage to nG and nGi.

Does this need to suffix the value with B?

travisghansen commented 4 years ago

Hmm, that's odd. We actually send that value as an integer I believe so it converts whatever the value is in the request to a number. I'll have to do an 11.3 install and see what's up.

maxirus commented 4 years ago

I have been searching the FreeNAS code base looking for this message and all I've come up with thus far is an old version of it here.

My guess is the RegEx changed.

travisghansen commented 4 years ago

I'll update my test system to 11.3 and do some trial and error and see what I can find. It's really unfortunate they keep making changes like this making it difficult to maintain sanity across versions :(

In the csi driver all the zfs operations are done via the cli over ssh so this kind of garbage should be highly mitigated.

maxirus commented 4 years ago

Sounds good. Let me know if I can help in anyway. In the mean time I will keep trying to debug.

One thing to note, it does appear this code base simply passes along the value in spec.resources.requests.storage in the PVC. In K8s that's 10GiB for example. In the FreeNAS UI, this option is formatted to 10 GiB (notice the space). Not sure if this makes a difference.

Also, I noticed none of the PVC I created before upgrading to 11.3 have their quotas set even though I have it enabled. I'm wondering if in v11.2 if this simply silently failed.

travisghansen commented 4 years ago

What version of 11.2 have you been running. I've been on various versions of 11.2 for a long time and the quotas are always set.

I'm wondering if this is connected to creating the namespace dataset...do you have that enabled? Subsequently, do you have quotas enabled on the namespaces?

Can you send over a copy of your class definition?

travisghansen commented 4 years ago

Yeah, so I've confirmed the provisioner does send the value as an integer/number in the JSON over the wire. Also confirmed that 11.3 is rejecting that syntax which is highly unfortunate. To make matters worse the response value for that field is still returned as a number in the json.

My go skills are pretty limited but the structs that marshal/unmarshal this stuff expect the value types to be the same for both in/outbound so I'm not sure if it's even possible to deal with this without a major headache. Perhaps @nmaupu has some idea?

It does appear as if 11.2 supports sending the value as a string so at least there's that (meaning, if we change it to send a string we hopefully aren't breaking all the older versions for FreeNAS). Not sure if all the older versions of FreeNAS would support that syntax though..

travisghansen commented 4 years ago

Pretty sure this diff would solve your problem:

diff --git a/freenas/dataset.go b/freenas/dataset.go
index 0305c74..f7d25ea 100644
--- a/freenas/dataset.go
+++ b/freenas/dataset.go
@@ -6,6 +6,7 @@ import (
        "fmt"
        "github.com/golang/glog"
        "path/filepath"
+       "strconv"
 )

 var (
@@ -27,6 +28,41 @@ type Dataset struct {
        Comments       string `json:"comments,omitempty"`
 }

+func (d *Dataset) MarshalJSON() ([]byte, error) {
+       data := &struct {
+               Avail          int64  `json:"avail,omitempty"`
+               Mountpoint     string `json:"mountpoint,omitempty"`
+               Name           string `json:"name"`
+               Pool           string `json:"pool"`
+               Recordsize     int64  `json:"recordsize,omitempty"`
+               Quota          int64  `json:"quota,omitempty"`
+               Reservation    int64  `json:"reservation,omitempty"`
+               Refquota       string `json:"refquota,omitempty"`
+               Refreservation int64  `json:"refreservation,omitempty"`
+               Refer          int64  `json:"refer,omitempty"`
+               Used           int64  `json:"used,omitempty"`
+               Comments       string `json:"comments,omitempty"`
+       }{
+               Avail:          d.Avail,
+               Mountpoint:     d.Mountpoint,
+               Name:           d.Name,
+               Pool:           d.Pool,
+               Recordsize:     d.Recordsize,
+               Quota:          d.Quota,
+               Reservation:    d.Reservation,
+               Refreservation: d.Refreservation,
+               Refer:          d.Refer,
+               Used:           d.Used,
+               Comments:       d.Comments,
+       }
+
+       if d.Refquota > 0 {
+               data.Refquota = strconv.FormatInt(d.Refquota, 10) + "b"
+       }
+
+       return json.Marshal(data)
+}
+
 func (d *Dataset) String() string {
        return filepath.Join(d.Pool, d.Name)
 }

Based on suggestions here: http://choly.ca/post/go-json-marshalling/

maxirus commented 4 years ago

What version of 11.2 have you been running.

I was on v11.2-U7.

I'm wondering if this is connected to creating the namespace dataset...do you have that enabled?

Yes

Subsequently, do you have quotas enabled on the namespaces?

No

Can you send over a copy of your class definition?

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: freenas-nfs
  labels:
    app.kubernetes.io/instance: mighty-mosquito
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: freenas-provisioner
    app.kubernetes.io/version: '2.5'
    helm.sh/chart: freenas-provisioner-0.1.1
  annotations:
    storageclass.kubernetes.io/is-default-class: 'true'
provisioner: freenas.org/nfs
parameters:
  datasetDeterministicNames: 'true'
  datasetEnableNamespaces: 'true'
  datasetEnableQuotas: 'true'
  datasetEnableReservation: 'false'
  datasetParentName: tank/Kubernetes
  datasetPermissionsGroup: wheel
  datasetPermissionsMode: '0777'
  datasetPermissionsUser: root
  datasetRetainPreExisting: 'true'
  serverSecretName: freenas-nfs
  serverSecretNamespace: freenas
  shareAlldirs: 'true'
  shareAllowedHosts: ''
  shareAllowedNetworks: ''
  shareMaprootGroup: wheel
  shareMaprootUser: root
  shareRetainPreExisting: 'true'
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate

So you're saying options.PVC.Spec.Resources.Requests[v1.ResourceName(v1.ResourceStorage)].Value() returns an integer? Is it converting the String (ie. 10 GiB) to bytes as an Integer? Interesting that the API Docs show an integer in the example yet the Docs say the data type is expecting a string.

travisghansen commented 4 years ago

It does return an integer yes. That's what gets sent over the wire as a json number too.

The patch above will send it over the wire as a string and add a b to it.

maxirus commented 4 years ago

Gotcha. The workaround is to just disable datasetEnableQuotas for the time being but of course this allows to the volumes to grow unbounded so not ideal.

Will you be committing the addition you provided above?

Tibernut commented 4 years ago

I tried the patch above but still ran into the same issues but this time with refreservation. I added an if statement to your patch in the same way that you handled Refquota and am able to provision pvs normally now.

My install is a pretty barebones, proof of concept though, so it could be there are other issues lurking.

Here was what I was seeing from wireshark before I modified your patch

From provisioner
JavaScript Object Notation: application/json
    Object
        Member Key: name
            String value: freenas-test-pvc
            Key: name
        Member Key: pool
            String value: datastore01
            Key: pool
        Member Key: refquota
            String value: 1073741824b
            Key: refquota
        Member Key: refreservation
            Number value: 1073741824
            Key: refreservation
        Member Key: comments
            String value: /default/freenas-test-pvc
            Key: comments

From freenas
JavaScript Object Notation: application/json
    Object
        Member Key: refreservation
            Array
                String value: Specify the value with IEC suffixes, e.g. 10 GiB
            Key: refreservation

And the error from the provisioner

  Warning  ProvisioningFailed    9s (x4 over 55s)   freenas.org/nfs_freenas-nfs-provisioner-c5f4447b7-m27rr_6f5e1d2d-52c2-11ea-aaa4-aa8ced21171f  failed to provision volume with StorageClass "freenas-nfs": Error creating dataset "datastore01/default/freenas-test-pvc" - message: {"refreservation":["Specify the value with IEC suffixes, e.g. 10 GiB"]}, status: 400

More detail what I did to get it working in my cluster: From your patch I removed

Refreservation: d.Refreservation,

and added

       if d.Refreservation > 0 {
               data.Refreservation = strconv.FormatInt(d.Refreservation, 10) +  " b"
       }

Thank you Travis and Nicolas for this project! I look forward to trying your csi driver based provisioner.

maxirus commented 4 years ago

Ah, makes sense that the reservation value would change too. Actually, looking at the API Docs, it seems most of them changed to String.

More detail what I did to get it working in my cluster: From your patch I removed Refreservation: d.Refreservation,

I don't think you need to remove this. Think you just need to change it to a string type in the struct. Could be wrong; I'm far from a Go dev.

travisghansen commented 4 years ago

The csi driver is ready to go if you're waiting for something there's no need to wait.

I wondered if other values would exhibit the same issue. Good to know. I'll fix up a patch with that field included as well. Not sure about commiting at this point. I need to try that against some older FreeNAS versions to make sure we're not breaking something in the other direction. If it works going back then I'd have no issue commiting it personally. I would assume it works going back to pretty old versions, but I've learned not to assume anything with the FreeNAS API :(

maxirus commented 4 years ago

I can't seem to find the CSI driver repo. I see your ISCSI variant but not CSI. Am I missing something?

travisghansen commented 4 years ago

Probably, it's here: https://github.com/democratic-csi/charts

Documentation is missing still but there are relevant examples in the repo which should get you started. What tool are you using to install your cluster?

alerosso commented 4 years ago

Will you be committing the addition you provided above?

Hey guys, I'm interested too in a new release with the fix!

travisghansen commented 4 years ago

Does anyone on the thread have older versions of FreeNAS installed they could run a curl command against to test on older versions for me?

travisghansen commented 4 years ago

I've got an old install FreeNAS-9.3-STABLE-201503200528 that doesn't appear to have that endpoint at all...so I guess that's a non-issue. Anyone with 9.10 or older versions of 11?

maxirus commented 4 years ago

I do not but I'm wondering if I could test this in K8s or Docker... I can research this later.

Another thought I had would be to check the FreeNAS version through the API to make this new functionality conditional.

travisghansen commented 4 years ago

It could be yes, but the version endpoint is a bit of a mess over time as well (considered using it for the csi side of the equation). That combined with gos strong typing would make for some pretty messy stuff but doable.

Right now it appears there are 4 fields which require this in 11.3. Updated diff below:

diff --git a/freenas/dataset.go b/freenas/dataset.go
index 0305c74..a60be1e 100644
--- a/freenas/dataset.go
+++ b/freenas/dataset.go
@@ -6,6 +6,7 @@ import (
        "fmt"
        "github.com/golang/glog"
        "path/filepath"
+       "strconv"
 )

 var (
@@ -27,6 +28,49 @@ type Dataset struct {
        Comments       string `json:"comments,omitempty"`
 }

+func (d *Dataset) MarshalJSON() ([]byte, error) {
+       data := &struct {
+               Avail          int64  `json:"avail,omitempty"`
+               Mountpoint     string `json:"mountpoint,omitempty"`
+               Name           string `json:"name"`
+               Pool           string `json:"pool"`
+               Recordsize     int64  `json:"recordsize,omitempty"`
+               Quota          string `json:"quota,omitempty"`
+               Reservation    string `json:"reservation,omitempty"`
+               Refquota       string `json:"refquota,omitempty"`
+               Refreservation string `json:"refreservation,omitempty"`
+               Refer          int64  `json:"refer,omitempty"`
+               Used           int64  `json:"used,omitempty"`
+               Comments       string `json:"comments,omitempty"`
+       }{
+               Avail:          d.Avail,
+               Mountpoint:     d.Mountpoint,
+               Name:           d.Name,
+               Pool:           d.Pool,
+               Recordsize:     d.Recordsize,
+               Refer:          d.Refer,
+               Used:           d.Used,
+               Comments:       d.Comments,
+       }
+       if d.Quota > 0 {
+               data.Quota = strconv.FormatInt(d.Quota, 10) + "b"
+       }
+
+       if d.Reservation > 0 {
+               data.Reservation = strconv.FormatInt(d.Reservation, 10) + "b"
+       }
+
+       if d.Refquota > 0 {
+               data.Refquota = strconv.FormatInt(d.Refquota, 10) + "b"
+       }
+
+       if d.Refreservation > 0 {
+               data.Refreservation = strconv.FormatInt(d.Refreservation, 10) + "b"
+       }
+
+       return json.Marshal(data)
+}
+
 func (d *Dataset) String() string {
        return filepath.Join(d.Pool, d.Name)
 }

At this point I'd lean toward running with the patch and deal with older versions of FreeNAS should it ever come up. I know the updated syntax at least works with the latest 11.2.

nmaupu commented 4 years ago

Freenas upgrades and API breaking changes keep bugging me year after year... I do have a FreeNAS-11.2-U7 to make some tests if need be but I guess an older version is needed... Best bet is maybe to put a flag in the conf to specify which version of the code to use so retro compatibility is assured.

travisghansen commented 4 years ago

@nmaupu agreed on the API changing madness! That's one of the many reasons I decided to go with raw zfs commands over ssh with the csi project..the surface area of API calls there is very limited.

The version stuff is OK, but as I mentioned the version strings returned over the years from the API are madness as well. My vote would be to merge what we have above and just run with it. If issues come up for older versions then address them if necessary.

travisghansen commented 4 years ago

Just created #35

maxirus commented 4 years ago

What tool are you using to install your cluster?

Missed this Q the first read through. Kubernetes cluster? Currently using Terraform + K3s + Helm3.

...decided to go with raw zfs commands over ssh with the csi project

Noticed that. IMO the API is more scalable and provides a much easier integration. In Enterprise world, often times SSH is blocked and/or disabled. I can definitely understand the frustrations with FreeNAS not following SemVer with regards to their API but I think that will improve. As more third-party projects are built around it and their own UI now heavily dependent upon it, I think we'll see a change here.

travisghansen commented 4 years ago

Ok, iscsi with csi with rke or rancher can be tricky so just wanted to make sure you were set on that.

That's fair about ssh, I had several other reasons as well for example it will work with any zfs based solution very easily...(ie with very little extra code it supports any generic ZoL install such as Ubuntu to rhel). Also, I'm doing some fairly advanced stuff with zfs properties, clones, snapshots, zfs send/receive, etc that simply don't exist now in their API and likely never will (unless they create a generic API method to allow arbitrary shell exec).

In any case, the project was built to generally support many 'drivers' anyhow so a pure HTTP FreeNAS driver theoretically could be added for sure if there's need/interest. I'm planning to create another zfs driver for ephemeral volumes for example when I get a little extra time.

alerosso commented 4 years ago

News about the pull request #35 ?

shairozan commented 4 years ago

So I can see the PR was merged, but I just pulled today and am still experiencing the problem. FreeNAS-11.3-U2.1

After a quick look the go code above I realized I just needed to disable Reservations and Quotas. Sorry for the confusion

schnable commented 4 years ago

Should this be closed? The failure still occurs when the default configuration is used (reservations and quotas enabled). Is there another issue to track that?

travisghansen commented 4 years ago

The default yaml is still pointing to an older version. Try updating to 2.6 and see how it goes.

schnable commented 4 years ago

Yes - I pulled and tested with 2.6. I still needed to update my storage class (based on deploy/class.yaml) to explicitly disable quotas and reservations. According to the comments there, these are enabled by default - and when they are enabled you will reproduce this...

travisghansen commented 4 years ago

If you're on the latest code you shouldn't be hitting this issue. There's another 'bug' where if you're on FreeNAS 11.3 (I think) FreeNAS no longer lets you create datasets smaller than 1G...are you sure you aren't hitting that?

schnable commented 4 years ago

just by setting the quota / reservation setting to "false" is enough to get this working - so I don't think it it dataset size issue. I will retry with a fresh clone / fresh build (arm) and see if there is something off my build/deploy.

travisghansen commented 4 years ago

That's because if you turn those off it just creates a dataset that inherits the size of the parent and ignores the quota/request from k8s :)

ronaldsjordan commented 3 years ago

I'm running FreeNAS 11.3U5 and freenas-provisioner 2.6 and I'm getting the following error when provisioning the pvc in test-claim.pvc:

failed to provision volume with StorageClass "freenas-nfs": Error creating dataset "vol0/k8s-pv/default/freenas-test-pvc" - message: {"refquota":["Specify the value with IEC suffixes, e.g. 10 GiB"]}, status: 400

I ran a tcpdump and confirmed the JSON in the HTTP Post is not appending 'B' to the quota. Here is the JSON I got from the Post in the capture:

{"name":"freenas-test-pvc","pool":"vol0","refquota":1073741824,"comments":"/default/freenas-test-pvc"}

The only changes I made was to increase the storage size to '1Gi' in test-claim.yaml due to the minimum quota value in FreeNAS 11.3, and I also set datasetEnableReservation to false in class.yaml because I don't want a reservation.