Closed travnewmatic closed 3 years ago
i hate to complain, but this is so unbelievably painful. first it happened with my minio, so i ferried everything out of that PV into the minio server in TrueNAS.
now postgres, redis, and mysql are down, so none of my services work.
i know this is a free product, and it doesnt come with a warranty. when this thing works, it works very well!
but when it doesnt... life is not so great :\
not sure if others are experiencing the same problem, but if there's any other way i can help, please let me know!
to troubleshoot, i've restarted both my TruenNAS appliance, and my kubernetes node. this appears to have no effect on any of the misbehaving volumes.
tried deleting and recreating the pv and pvc for the volumes, didnt work.
i've also tried disabling/reenabling the extent
in truenas to no avail
here's some logs from k logs -f hpe-csi-controller-56bc8cdd4-k9v5t -c csi-attacher | grep pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
I0308 10:41:59.970916 1 csi_handler.go:598] CSIHandler: processing PV "pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb": no deletion timestamp, ignoring
I0308 10:41:59.971750 1 connection.go:183] GRPC request: {"node_id":"5a59a5cd-4f1c-6b33-7330-000000000000","secrets":"***stripped***","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"allowOverrides":"sparse,compression,deduplication,volblocksize,sync,description","compression":"LZ4","deduplication":"OFF","fsType":"ext4","root":"pool0/csi-volumes","storage.kubernetes.io/csiProvisionerIdentity":"1612770601328-8081-csi.hpe.com","sync":"STANDARD","targetScope":"volume","volblocksize":"8K","volumeAccessMode":"mount"},"volume_id":"pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb"}
I0308 10:42:03.484858 1 connection.go:186] GRPC error: rpc error: code = Internal desc = Failed to add ACL to volume pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb for node &{ k3s0 5a59a5cd-4f1c-6b33-7330-000000000000 [0xc0006bd320] [0xc000709140 0xc000709150 0xc000709160] [] } via CSP, err: Request failed with status code 500 and errors Error code (Exception) and message (Traceback (most recent call last):
I0308 10:42:03.488421 1 csi_handler.go:222] Error processing "csi-a00a62e75930374aa58e9f0befc11792ff595c0dcd0ba7f5d1889940e2114bd9": failed to attach: rpc error: code = Internal desc = Failed to add ACL to volume pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb for node &{ k3s0 5a59a5cd-4f1c-6b33-7330-000000000000 [0xc0006bd320] [0xc000709140 0xc000709150 0xc000709160] [] } via CSP, err: Request failed with status code 500 and errors Error code (Exception) and message (Traceback (most recent call last):
I0308 10:44:00.145298 1 csi_handler.go:332] PV finalizer is already set on "pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb"
I0308 10:44:00.148045 1 connection.go:183] GRPC request: {"node_id":"5a59a5cd-4f1c-6b33-7330-000000000000","secrets":"***stripped***","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"allowOverrides":"sparse,compression,deduplication,volblocksize,sync,description","compression":"LZ4","deduplication":"OFF","fsType":"ext4","root":"pool0/csi-volumes","storage.kubernetes.io/csiProvisionerIdentity":"1612770601328-8081-csi.hpe.com","sync":"STANDARD","targetScope":"volume","volblocksize":"8K","volumeAccessMode":"mount"},"volume_id":"pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb"}
I0308 10:44:01.487843 1 connection.go:186] GRPC error: rpc error: code = Internal desc = Failed to add ACL to volume pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb for node &{ k3s0 5a59a5cd-4f1c-6b33-7330-000000000000 [0xc0006bde40] [0xc0008944c0 0xc0008944d0 0xc0008944e0] [] } via CSP, err: Request failed with status code 500 and errors Error code (Exception) and message (Traceback (most recent call last):
I0308 10:44:01.491140 1 csi_handler.go:222] Error processing "csi-a00a62e75930374aa58e9f0befc11792ff595c0dcd0ba7f5d1889940e2114bd9": failed to attach: rpc error: code = Internal desc = Failed to add ACL to volume pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb for node &{ k3s0 5a59a5cd-4f1c-6b33-7330-000000000000 [0xc0006bde40] [0xc0008944c0 0xc0008944d0 0xc0008944e0] [] } via CSP, err: Request failed with status code 500 and errors Error code (Exception) and message (Traceback (most recent call last):
I0308 10:49:01.491425 1 csi_handler.go:332] PV finalizer is already set on "pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb"
I0308 10:49:01.494429 1 connection.go:183] GRPC request: {"node_id":"5a59a5cd-4f1c-6b33-7330-000000000000","secrets":"***stripped***","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"allowOverrides":"sparse,compression,deduplication,volblocksize,sync,description","compression":"LZ4","deduplication":"OFF","fsType":"ext4","root":"pool0/csi-volumes","storage.kubernetes.io/csiProvisionerIdentity":"1612770601328-8081-csi.hpe.com","sync":"STANDARD","targetScope":"volume","volblocksize":"8K","volumeAccessMode":"mount"},"volume_id":"pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb"}
I0308 10:49:02.824599 1 connection.go:186] GRPC error: rpc error: code = Internal desc = Failed to add ACL to volume pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb for node &{ k3s0 5a59a5cd-4f1c-6b33-7330-000000000000 [0xc000662af0] [0xc0007081c0 0xc0007081d0 0xc0007081e0] [] } via CSP, err: Request failed with status code 500 and errors Error code (Exception) and message (Traceback (most recent call last):
I0308 10:49:02.827512 1 csi_handler.go:222] Error processing "csi-a00a62e75930374aa58e9f0befc11792ff595c0dcd0ba7f5d1889940e2114bd9": failed to attach: rpc error: code = Internal desc = Failed to add ACL to volume pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb for node &{ k3s0 5a59a5cd-4f1c-6b33-7330-000000000000 [0xc000662af0] [0xc0007081c0 0xc0007081d0 0xc0007081e0] [] } via CSP, err: Request failed with status code 500 and errors Error code (Exception) and message (Traceback (most recent call last):
I0308 10:51:59.970103 1 csi_handler.go:332] PV finalizer is already set on "pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb"
I0308 10:51:59.971032 1 controller.go:235] Started PV processing "pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb"
I0308 10:51:59.971037 1 csi_handler.go:594] CSIHandler: processing PV "pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb"
I0308 10:51:59.971043 1 csi_handler.go:598] CSIHandler: processing PV "pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb": no deletion timestamp, ignoring
I0308 10:51:59.972549 1 connection.go:183] GRPC request: {"node_id":"5a59a5cd-4f1c-6b33-7330-000000000000","secrets":"***stripped***","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"allowOverrides":"sparse,compression,deduplication,volblocksize,sync,description","compression":"LZ4","deduplication":"OFF","fsType":"ext4","root":"pool0/csi-volumes","storage.kubernetes.io/csiProvisionerIdentity":"1612770601328-8081-csi.hpe.com","sync":"STANDARD","targetScope":"volume","volblocksize":"8K","volumeAccessMode":"mount"},"volume_id":"pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb"}
I0308 10:52:03.712566 1 connection.go:186] GRPC error: rpc error: code = Internal desc = Failed to add ACL to volume pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb for node &{ k3s0 5a59a5cd-4f1c-6b33-7330-000000000000 [0xc0006fa0c0] [0xc000047140 0xc000047150 0xc000047160] [] } via CSP, err: Request failed with status code 500 and errors Error code (Exception) and message (Traceback (most recent call last):
I0308 10:52:03.716687 1 csi_handler.go:222] Error processing "csi-a00a62e75930374aa58e9f0befc11792ff595c0dcd0ba7f5d1889940e2114bd9": failed to attach: rpc error: code = Internal desc = Failed to add ACL to volume pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb for node &{ k3s0 5a59a5cd-4f1c-6b33-7330-000000000000 [0xc0006fa0c0] [0xc000047140 0xc000047150 0xc000047160] [] } via CSP, err: Request failed with status code 500 and errors Error code (Exception) and message (Traceback (most recent call last):
I0308 10:54:02.827899 1 csi_handler.go:332] PV finalizer is already set on "pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb"
I0308 10:54:02.830888 1 connection.go:183] GRPC request: {"node_id":"5a59a5cd-4f1c-6b33-7330-000000000000","secrets":"***stripped***","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"allowOverrides":"sparse,compression,deduplication,volblocksize,sync,description","compression":"LZ4","deduplication":"OFF","fsType":"ext4","root":"pool0/csi-volumes","storage.kubernetes.io/csiProvisionerIdentity":"1612770601328-8081-csi.hpe.com","sync":"STANDARD","targetScope":"volume","volblocksize":"8K","volumeAccessMode":"mount"},"volume_id":"pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb"}
I0308 10:54:04.421045 1 connection.go:186] GRPC error: rpc error: code = Internal desc = Failed to add ACL to volume pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb for node &{ k3s0 5a59a5cd-4f1c-6b33-7330-000000000000 [0xc0006fb8c0] [0xc00082f100 0xc00082f110 0xc00082f120] [] } via CSP, err: Request failed with status code 500 and errors Error code (Exception) and message (Traceback (most recent call last):
I0308 10:54:04.424571 1 csi_handler.go:222] Error processing "csi-a00a62e75930374aa58e9f0befc11792ff595c0dcd0ba7f5d1889940e2114bd9": failed to attach: rpc error: code = Internal desc = Failed to add ACL to volume pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb for node &{ k3s0 5a59a5cd-4f1c-6b33-7330-000000000000 [0xc0006fb8c0] [0xc00082f100 0xc00082f110 0xc00082f120] [] } via CSP, err: Request failed with status code 500 and errors Error code (Exception) and message (Traceback (most recent call last):
I0308 10:59:04.424859 1 csi_handler.go:332] PV finalizer is already set on "pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb"
I0308 10:59:04.428183 1 connection.go:183] GRPC request: {"node_id":"5a59a5cd-4f1c-6b33-7330-000000000000","secrets":"***stripped***","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"allowOverrides":"sparse,compression,deduplication,volblocksize,sync,description","compression":"LZ4","deduplication":"OFF","fsType":"ext4","root":"pool0/csi-volumes","storage.kubernetes.io/csiProvisionerIdentity":"1612770601328-8081-csi.hpe.com","sync":"STANDARD","targetScope":"volume","volblocksize":"8K","volumeAccessMode":"mount"},"volume_id":"pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb"}
I0308 10:59:05.769993 1 connection.go:186] GRPC error: rpc error: code = Internal desc = Failed to add ACL to volume pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb for node &{ k3s0 5a59a5cd-4f1c-6b33-7330-000000000000 [0xc000832f00] [0xc000047f80 0xc000047f90 0xc000047fa0] [] } via CSP, err: Request failed with status code 500 and errors Error code (Exception) and message (Traceback (most recent call last):
I0308 10:59:05.773306 1 csi_handler.go:222] Error processing "csi-a00a62e75930374aa58e9f0befc11792ff595c0dcd0ba7f5d1889940e2114bd9": failed to attach: rpc error: code = Internal desc = Failed to add ACL to volume pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb for node &{ k3s0 5a59a5cd-4f1c-6b33-7330-000000000000 [0xc000832f00] [0xc000047f80 0xc000047f90 0xc000047fa0] [] } via CSP, err: Request failed with status code 500 and errors Error code (Exception) and message (Traceback (most recent call last):
heres some logs from k logs -f truenas-csp-7c7d98c5d-grq7q | grep pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:12:00 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:12:03 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:15:58 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:15:59 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:18:15 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:18:15 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:18:54 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:18:55 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:18:57 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:19:01 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:19:10 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:19:26 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:20:00 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:20:01 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:20:04 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:20:08 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:20:16 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:20:32 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:21:24 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:21:25 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:21:26 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:21:27 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:21:27 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:21:28 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:21:29 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:21:30 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:21:39 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:21:39 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:21:56 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:21:57 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:22:00 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:22:03 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:22:29 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:22:30 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:24:39 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:24:39 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:28:56 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:28:57 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:32:00 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:32:03 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:33:57 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:33:58 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:38:59 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:38:59 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:42:00 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:42:03 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:44:00 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:44:01 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:49:01 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:49:02 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:52:00 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:52:03 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:54:03 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:54:04 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:59:04 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 10:59:05 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 11:02:00 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 11:02:03 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
heres some recent logs from k logs -f truenas-csp-7c7d98c5d-grq7q
Mon, 08 Mar 2021 11:02:05 +0000 backend ERROR Exception: Traceback (most recent call last):
File "/app/truenascsp.py", line 148, in on_put
'serial_number': extent.get('naa').lstrip('0x'),
AttributeError: 'NoneType' object has no attribute 'lstrip'
Mon, 08 Mar 2021 11:02:31 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-056bc194-047d-4d66-8b91-55f922da3f0b
Mon, 08 Mar 2021 11:02:33 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-e0176a83-a87e-462e-8457-830998630492
Mon, 08 Mar 2021 11:02:35 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-a488756b-72cb-42b1-b74b-15e9a4727ce8
Mon, 08 Mar 2021 11:02:50 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-7c561bf2-8af1-4248-b72a-db7f7e384af0
Mon, 08 Mar 2021 11:02:55 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-e3d6abd6-e86b-4dc9-bf27-5c637f84cca0
Mon, 08 Mar 2021 11:03:04 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-dbcf41a5-ca4b-4937-89b5-4b3994ff36ab
Mon, 08 Mar 2021 11:03:08 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-e8c5f751-6ea3-41ea-b028-4d54a9cae47e
Mon, 08 Mar 2021 11:03:09 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-e525f74e-8565-4ac1-b9d3-ee5c0bcd0ea2
Mon, 08 Mar 2021 11:03:36 +0000 backend INFO Volume found: nvme0_csi-volumes_pvc-072bafae-04d4-48c4-a127-0f55a65da967
Mon, 08 Mar 2021 11:03:50 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-056bc194-047d-4d66-8b91-55f922da3f0b
Mon, 08 Mar 2021 11:04:06 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 11:04:06 +0000 backend INFO Host updated: 5a59a5cd-4f1c-6b33-7330-000000000000
Mon, 08 Mar 2021 11:04:06 +0000 backend ERROR Backend Request (DELETE) Exception: Traceback (most recent call last):
File "/app/backend.py", line 281, in delete
self.req_backend.raise_for_status()
File "/usr/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://192.168.1.202/api/v2.0/iscsi/target/id/34
Mon, 08 Mar 2021 11:04:06 +0000 backend ERROR Backend Request (DELETE) Exception: Traceback (most recent call last):
File "/app/backend.py", line 281, in delete
self.req_backend.raise_for_status()
File "/usr/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://192.168.1.202/api/v2.0/iscsi/extent/id/34
Mon, 08 Mar 2021 11:04:06 +0000 backend INFO Volume unpublished: pool0_csi-volumes_pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb
Mon, 08 Mar 2021 11:04:06 +0000 backend ERROR Backend Request (POST) Exception: Traceback (most recent call last):
File "/app/backend.py", line 242, in post
self.req_backend.raise_for_status()
File "/usr/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://192.168.1.202/api/v2.0/iscsi/target
Mon, 08 Mar 2021 11:04:07 +0000 backend ERROR Backend Request (POST) Exception: Traceback (most recent call last):
File "/app/backend.py", line 242, in post
self.req_backend.raise_for_status()
File "/usr/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://192.168.1.202/api/v2.0/iscsi/extent
Mon, 08 Mar 2021 11:04:07 +0000 backend ERROR Backend Request (POST) Exception: Traceback (most recent call last):
File "/app/backend.py", line 242, in post
self.req_backend.raise_for_status()
File "/usr/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://192.168.1.202/api/v2.0/iscsi/targetextent
Mon, 08 Mar 2021 11:04:07 +0000 backend ERROR Exception: Traceback (most recent call last):
File "/app/truenascsp.py", line 148, in on_put
'serial_number': extent.get('naa').lstrip('0x'),
AttributeError: 'NoneType' object has no attribute 'lstrip'
Mon, 08 Mar 2021 11:04:18 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-e0176a83-a87e-462e-8457-830998630492
Mon, 08 Mar 2021 11:04:20 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-e8c5f751-6ea3-41ea-b028-4d54a9cae47e
Mon, 08 Mar 2021 11:04:29 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-a488756b-72cb-42b1-b74b-15e9a4727ce8
Mon, 08 Mar 2021 11:04:37 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-e3d6abd6-e86b-4dc9-bf27-5c637f84cca0
Mon, 08 Mar 2021 11:04:38 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-7c561bf2-8af1-4248-b72a-db7f7e384af0
Mon, 08 Mar 2021 11:04:50 +0000 backend INFO Volume found: pool0_csi-volumes_pvc-dbcf41a5-ca4b-4937-89b5-4b3994ff36ab
kubernetes node and truenas appliance have been running for some time now
This is really odd. What version of TrueNAS are you running? Did you upgrade TrueNAS and it went south? Somehow the API endpoint of extent seem broken. https://192.168.1.202/api/v2.0/iscsi/extent/id/34 and https://192.168.1.202/api/v2.0/iscsi/extent
Turning on debug logging might reveal more but be careful pasting it here as it contains your API key.
If you can give me your exact TrueNAS version and how many nodes and PVs you have, I'll try reproduce it tonight (this is not my day job).
Edit: I just wonder if it's as easy as the iSCSI service simply not being started on your TrueNAS?
hey thanks so much for your response!
What version of TrueNAS are you running?
Version: TrueNAS-12.0-U2.1
Did you upgrade TrueNAS and it went south?
i did upgrade my TrueNAS recently, and it went smoothly. From what i recall, I stopped my kubernetes node when i did the TrueNAS upgrade, rebooted the truenas node, and then started the kubernetes node again. At that point, all my volumes came back properly. Things going south didnt start until i ran a helm upgrade
on my databases and the containers got recreated.
Turning on debug logging might reveal more but be careful pasting it here as it contains your API key.
I think i see how to do that
how many nodes
was one node, then two nodes, than went back to one node.
and PVs you have
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-056bc194-047d-4d66-8b91-55f922da3f0b 8Gi RWO Delete Bound monica/monica truenas-ssd 28d pvc-072bafae-04d4-48c4-a127-0f55a65da967 100Gi RWO Delete Bound elasticsearch/elasticsearch-master-elasticsearch-master-0 truenas-nvme 27d pvc-293084b3-1c85-43f5-af07-408093ebf78b 1Gi RWO Delete Bound funkwhale/funkwhale-frontend truenas-ssd 28d pvc-3ae39163-1026-448c-861f-b07f7a1f531b 1Gi RWO Delete Bound prosody/prosody truenas-ssd 28d pvc-4f147b42-4d1a-4b8b-90f5-b5a502bc21f5 1Gi RWO Delete Bound bitwardenrs/bitwardenrs truenas-ssd 28d pvc-633c5390-02b7-49fb-9c4d-42f6ed67a12f 1Gi RWO Delete Bound relay/relay truenas-ssd 28d pvc-70cc2c2f-8c25-45bc-ab2b-c1c65cb97ceb 1Gi RWO Delete Bound mattermost/mattermost-mattermost-team-edition-plugins truenas-ssd 15h pvc-7c561bf2-8af1-4248-b72a-db7f7e384af0 200Gi RWO Delete Bound ipfs/ipfs-ipfs-0 truenas-ssd 28d pvc-826a81fa-6678-4ad0-a158-eb7e37db9657 200Gi RWO Delete Bound postgresql/data-postgresql-postgresql-0 truenas-nvme 29d pvc-928e5471-292a-46cf-8b49-5b5245c37f90 8Gi RWO Delete Bound mysql/data-mysql-0 truenas-nvme 27d pvc-a2a6b059-eb0e-4cf3-8f06-205deb8da0ec 500Gi RWX Delete Bound media/media truenas-nfs 17d pvc-a3a4f51a-2b36-4c54-9b0d-9dfc604eceec 20Gi RWO Delete Bound lemmy/pictrs truenas-ssd 28d pvc-a488756b-72cb-42b1-b74b-15e9a4727ce8 10Gi RWO Delete Bound peertube/peertube truenas-ssd 28d pvc-dbcf41a5-ca4b-4937-89b5-4b3994ff36ab 1Gi RWO Delete Bound freshrss/freshrss truenas-ssd 18d pvc-e0176a83-a87e-462e-8457-830998630492 1Gi RWO Delete Bound noip2/noip2 truenas-ssd 28d pvc-e3d6abd6-e86b-4dc9-bf27-5c637f84cca0 5Gi RWO Delete Bound media/jellyfin truenas-ssd 17d pvc-e525f74e-8565-4ac1-b9d3-ee5c0bcd0ea2 8Gi RWO Delete Bound wordpress/wordpress truenas-ssd 29d pvc-e8c5f751-6ea3-41ea-b028-4d54a9cae47e 8Gi RWO Delete Bound thelounge/thelounge truenas-ssd 28d pvc-f11f572c-84f8-44f3-bbde-570ccc6d3d6b 100Gi RWO Delete Bound funkwhale/funkwhale-data truenas-ssd 28d pvc-f255d8e4-fb90-47cf-a56b-2cfb59bf5988 16Gi RWO Delete Bound redis/redis-data-redis-master-0 truenas-nvme 27d
- redis
- postgresql
- mysql
- mattermost
are having issues (though theres nothing in mattermost)
Edit: I just wonder if it's as easy as the iSCSI service simply not being started on your TrueNAS?
i just checked, iSCSI is running. I've tried rebooting the TrueNAS server since the problem started, and the same volumes have the same problems. all those other volumes get processed fine and my pods come up. Its just my poor databases that are having issues right now (and mattermost which doesnt matter... ironically)
I'll try reproduce it tonight (this is not my day job).
yes, thank you so much for investigating. I appreciate that this isn't your day job and that you have other responsibilities. this really is a fine product! other than this occasional volume rot problem, it works really really well! truenas-csp has been a game changer for me!
I'll investigate how to enable debugging and see if i can get more logs out of truenas.
Apologize about the debug logging, I thought that was parameterized in the YAML, but set LOG_DEBUG
as an environment variable to "true" in the CSP YAML manifest.
I upgraded my TrueNAS box to 12.0-U2.1 and deployed Redis on Kubernetes and rebooted, evicted and cordoned nodes to ensure the workload could move around. This is working as expected.
I'm just curious what the actual NAA is on the extent of those PVs that are having problems? You can find this under Sharing > Block (iSCSI) > Extents. You can also delete "Targets" and "Extends" of the troublesome PVs and see if that helps, they will be re-created on on the next node publish request.
aw man you beat me to it! :D thats exactly what i ended up doing:
persistentVolumeReclaimPolicy
to Retain
PersistentVolume
and PersistentVolumeClaim
manifests, prunePersistentVolumeClaim
PersistentVolume
Extent
Target
kubectl apply -f
the old pv and pvc manifestsTarget
and Extent
get recreatedWithin the past hour I got my databases running again.
Apologize about the debug logging, I thought that was parameterized in the YAML, but set LOG_DEBUG as an environment variable to "true" in the CSP YAML manifest.
I'll play with that right now
I upgraded up TrueNAS box to 12.0-U2.1 and deployed Redis and rebooted, evicted and cordoned nodes to ensure the workload could move around. This is working as expected.
Its a very intermittent problem. Perhaps scale your workloads to zero and back up again repeatedly? thats all i can think to do to recreate the problem.
I'm just curious what the actual NAA is on the extent of those PVs that are having problems?
As i've deleted them, i dont have them anymore :( but when it happens again, i'll look for that. I'm sorry i dont have that to report now.
You can also delete "Targets" and "Extends" of the troublesome PVs and see if that helps, they will be re-created on on the next node publish request.
Is deletion of the pv
and pvc
also necessary? I think i tried just deleting the target
and the extent
but the volume still wouldn't mount.
Is deletion of the pv and pvc also necessary? I think i tried just deleting the target and the extent but the volume still wouldn't mount.
Deleting the PVC/PV should not be necessary. However, the back-off could be minutes before you see the workload recover if it's been retrying for a while.
gotcha.
i apologize that now i dont have much more to give you to help troubleshoot this further :(
thank you for being so responsive in helping me to fix this.
if this happens again, i'll hold off on deleting things so that I/we can investigate this further.
again, really appreciate your work on this.
aside from flapping my workloads, is there anything else i can check which might help you diagnose this?
No worries, we'll try catch it live next time. I'm leaving my workload scaling up and down over night and we'll see what yields. If it hits again we need to inspect the extents and targets, basically why the TrueNAS API is throwing a HTTP 422 and if there's any other content besides the HTTP code that might reveal a clue.
gotcha, that sounds good.
thanks again for the assistance and have a great day!
any tips on this?