hpe-storage / truenas-csp

TrueNAS Container Storage Provider for HPE CSI Driver for Kubernetes
https://scod.hpedev.io
MIT License
66 stars 8 forks source link

FreeNAS 11.3 support #6

Closed maxirus closed 3 years ago

maxirus commented 3 years ago

Adds support compatibility for FreeNAS 11.3. Fixes #5

Note: PR ins common-host-libs must be merged before this will work.

maxirus commented 3 years ago

I assume you're referring to this HPE Spec. My approach takes more of a K8s native approach, using the APIs that it provides, as I didn't see a whole lot of value in doing string manipulations. However, I'm willing to change this.

Which behaviour to employ should be determined by the value of the system/version API resource

How do you propose to achieve this? I've checked various API endpoints and there's only a handle full of ones that do not require authentication (ie system/product_name). The one we'd need to achieve this would be system/version, however this requires authentication. I did find that the /ui/ endpoint has an etag header with this info:

$ curl -IL -s https://<freenas_host>/ui/ | grep etag
etag: FreeNAS-11.3-U5

but I'm not sure how reliable this would be.

datamattsson commented 3 years ago

Apologize the late reply. Busy times at my real work.

Re: the authentication piece it's not very elegant unless you have token based authentication the same way all the HPE backends deals with this. Ideally the CSP spec should allow more auth methods, like HTTP Basic. I think there's an approach where you can use the Authorization header directly with the base64 encoded string. Using the creds in the environment variables is not acceptable as creds should come from the CSI driver to the CSP.

As for checking the platform (TrueNAS vs FreeNAS) I was simply getting ahead of myself. If we don't think using the etag header for this purpose (as you found) we can try fall back to username/password if the API key doesn't work.

maxirus commented 3 years ago

Ok, I think that should be a relatively easy change.

I've been tinkering with this a bit and have found a few issues/suggestions:

  1. the pool/dataset queries should be locked to what's given in the Storage Class definition (working on this now)
  2. Password is leaked to logs when debug logs is enabled
  3. Each Class should instantiate its own logger
  4. getters/setter should be added to backend instead of directly modifying instance variables
  5. Upstream helm chart doesn't allow for disabling the un-needed components. Allowing these to be disabled (ie nimble-csp) would eliminate the need of maintaining a copy or core components via a sub-chart.
datamattsson commented 3 years ago

Ok, I think that should be a relatively easy change.

I've been tinkering with this a bit and have found a few issues/suggestions:

1. the `pool/dataset` queries should be locked to what's given in the Storage Class definition (working on this now)

Sounds good.

2. Password is leaked to logs when debug logs is enabled

Yes, this is an issue, thanks for reminding me.

3. Each Class should instantiate its own logger

That makes sense. Python is not my day job.

4. getters/setter should be added to backend instead of directly modifying instance variables

See above.

5. [Upstream helm chart](https://artifacthub.io/packages/helm/hpe-storage/hpe-csi-driver) doesn't allow for disabling the un-needed components. Allowing these to be disabled (ie [nimble-csp](https://github.com/hpe-storage/co-deployments/blob/master/helm/charts/hpe-csi-driver/templates/nimble-csp.yaml)) would eliminate the need of maintaining a copy or core components via a sub-chart.

This is being take of. Customers are complaining about this and you will have the option of disabling any of the HPE CSPs delivered with the chart if you so desire. This will be part of the next release of the CSI driver chart, end of May/June. This way I can create a TrueNAS chart and depend on the HPE chart and you'll only end up with what you need.

maxirus commented 3 years ago

Ok, Auth should be good to go now. Let me know if there are any remaining edits I may have missed.

On a side note, I am noticing that deletion is failing with the following error: E0225 03:25:57.365278 1 controller.go:1334] delete "pvc-a7351b99-9b7b-429d-9748-568c984ed1a7": volume deletion failed: rpc error: code = Internal desc = Volume pvc-a7351b99-9b7b-429d-9748-568c984ed1a7 with ID tank_k8s-iscsi_pvc-a7351b99-9b7b-429d-9748-568c984ed1a7 is still in use

I suspect this may be due to me deleting the CSI Driver after I created the volumes. Does the upstream hpe-csi-driver maintain a "state"? I've checked the nodes and do not see the volumes mounted so I suspect "volume still in use" is a red herring.

datamattsson commented 3 years ago

Hey, so I've been contemplating this a bit. Could we organize these PRs into more bite sized chunks?

  1. HTTP Basic authentication
  2. Password leak
  3. Separate Class loggers
  4. Getter/setter
  5. pool/dataset locking during queries

The README I can take care of after everything is tested and has been merged. I'm installing a FreeNAS 11.3 in my test bed and make sure it passes e2e testing.

datamattsson commented 3 years ago

@maxirus I submitted a PR to your branch with some changes. Once those are merged we can close this PR.