kadalu / kadalu

A lightweight Persistent storage solution for Kubernetes / OpenShift / Nomad using GlusterFS in background. More information at https://kadalu.tech
https://docs.kadalu.tech/k8s-storage/devel/quick-start/
Other
701 stars 95 forks source link

[Bug]: nomad controller exception #1033

Open nakermann1973 opened 7 months ago

nakermann1973 commented 7 months ago

Describe the bug Running the controller on nomad results in an Exception when trying to delete a volume

To Reproduce Steps to reproduce the behavior:

  1. On nomad, install controller and nodeplugins using version 1.2.0, following https://github.com/kadalu/kadalu/tree/devel/nomad
  2. I already had a volume created from a previous installation version). After upgrading to 1.2.0 I tried to delete the volume and got the error as seen below (nomad volume delete myvol)

Expected behavior Volume deleted as expected

Actual behavior nomad cli returns an error: `Error deleting volume: Unexpected response code: 500 (rpc error: controller delete volume: rpc error: controller delete volume: CSI.ControllerDeleteVolume: rpc error: code = Unknown desc = Exception calling application : local variable 'sock' referenced before assignment)``

Controller logs:

[2023-12-05 21:07:20,348] ERROR [_server - 508:_call_behavior] - Exception calling application: local variable 'sock' referenced before assignment
Traceback (most recent call last):
  File "/kadalu/lib/python3.10/site-packages/grpc/_server.py", line 494, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/kadalu/controllerserver.py", line 415, in DeleteVolume
    delete_volume(request.volume_id)
  File "/kadalu/volumeutils.py", line 610, in delete_volume
    vol = search_volume(volname)
  File "/kadalu/volumeutils.py", line 780, in search_volume
    mount_glusterfs(volume, mntdir)
  File "/kadalu/volumeutils.py", line 926, in mount_glusterfs
    return handle_external_volume(volume, mountpoint, is_client, volume['g_host'])
  File "/kadalu/volumeutils.py", line 1032, in handle_external_volume
    g_host = reachable_host(hosts)
  File "/kadalu/kadalulib.py", line 133, in reachable_host
    if is_host_reachable([host], 22):
  File "/kadalu/kadalulib.py", line 124, in is_host_reachable
    sock.close()
UnboundLocalError: local variable 'sock' referenced before assignment

Environment:

Additional context downgrading to 1.1.0 fixes the problem.

leelavg commented 2 months ago

error seems similar to #1051, however downgrading to 1.1.0 fixes the problem. is a bit alarming, as this states there's a regression.

if possible could you pls take a look at linked issue #1051 and provide any comments? I deployed nomad long back, maybe something is broken and I don't see 1.2 branch šŸ¤”

leelavg commented 2 months ago

1008 was the last bug I fixed which was reported by many during volume deletion, need to check the included releases