hpe-storage / truenas-csp

TrueNAS Container Storage Provider for HPE CSI Driver for Kubernetes
https://scod.hpedev.io
MIT License
65 stars 8 forks source link

422 Client Error #36

Closed ishioni closed 1 year ago

ishioni commented 1 year ago

I'm having a bit of stability problems on part of truenas (it seems scst is causing kernel panics). While debugging this, I've also noticed that truenas-csp likes to throw these around when trying to mount a bunch of pvcs at the same time. Any way to debug this further?

Fri, 11 Nov 2022 11:14:28 +0000 backend INFO Volume found: SSD_k3s_pvc-ae7555bd-b818-4580-9411-1082593ad332
Fri, 11 Nov 2022 11:14:28 +0000 backend ERROR Backend Request (DELETE) Exception: Traceback (most recent call last):
  File "/app/backend.py", line 306, in delete
    self.req_backend.raise_for_status()
  File "/usr/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://10.1.4.2/api/v2.0/iscsi/target/id/26

Fri, 11 Nov 2022 11:14:28 +0000 backend ERROR Backend Request (DELETE) Exception: Traceback (most recent call last):
  File "/app/backend.py", line 306, in delete
    self.req_backend.raise_for_status()
  File "/usr/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://10.1.4.2/api/v2.0/iscsi/extent/id/26

Fri, 11 Nov 2022 11:14:28 +0000 backend INFO Volume unpublished: SSD_k3s_pvc-ae7555bd-b818-4580-9411-1082593ad332
datamattsson commented 1 year ago

Thanks for reporting this. Could you elaborate on the numbers of PVCs and such that are involved here so I can try reproduce it? That 422 should probably be retried with a back-off before reporting the error to the CSI driver. The request above looks like it's an unmount that is being processed, not mount?

ishioni commented 1 year ago

It's actually both, happens when i schedule the creation/deletion of many pods at the same time (many = 5), almost like the truenas middleware isn't catching up. Which is weird, since it's running on an i7-7700, with the nas almost in idle

ishioni commented 1 year ago

I made a quick screengrab of a couple of things of interest whilst this is happening https://drive.google.com/file/d/1O_vrPwbf9LaCh1PXOMvYr_Hq4GzIU_O5/view?usp=sharing

datamattsson commented 1 year ago

Thanks for providing this. It goes without saying, control plane scale testing of any sort has not been performed. Which version of TrueNAS/FreeNAS is that?

ishioni commented 1 year ago

22.02.04

ishioni commented 1 year ago

Hey just to update, 22.12.0 fixed my kernel oops issue, but this still persists

datamattsson commented 1 year ago

Thanks for letting me know. I'm going to work on this over the Christmas break.

datamattsson commented 1 year ago

v2.3.0 just released. The CSP now handles more error conditions (422) more thoroughly.