open-iscsi / rtslib-fb

Python library for configuring the Linux kernel-based multiprotocol SCSI target (LIO)
Apache License 2.0
73 stars 90 forks source link

Failed : disk create/update failed on xxx. LUN allocation failure #181

Closed TaylorTrz closed 1 year ago

TaylorTrz commented 2 years ago

Description

Hello, this error prompt happens when i create an rbd image for iscsi gateway.

/iscsi-target...-igw/gateways> cd /disks
/disks> create pool=rbd image=disk_2 size=20G
Failed : disk create/update failed on xxx. LUN allocation failure

To further investigate this error, i check the log and tried to debug rtslib-fb...

debug message

> /usr/local/lib/python3.6/dist-packages/rtslib_fb/tcm.py(177)_enable()
-> fwrite(path, "1\n")
(Pdb) n
FileNotFoundError: [Errno 2] No such file or directory
(Pdb) p path
'/sys/kernel/config/target/core/user_0/rbd.disk_2/enable'

It seems like to write "1\n" to configfs device '/sys/kernel/config/target/core/user_0/rbd.disk_2/enable', and failed with [Errno 2]... So I also check this file's info (rw), and tried with python or shell:

/sys/kernel/config/target/core/user_0/rbd.disk_2# l
action/  alua/       attrib/  enable  lba_map  statistics/  wwn/
alias    alua_lu_gp  control  info    pr/      udev_path
root@mgt04:/sys/kernel/config/target/core/user_0/rbd.disk_2# ls -alh enable
-rw-r--r-- 1 root root 4.0K Jan 20 04:00 enable
>>> with open("/sys/kernel/config/target/core/user_0/rbd.disk_2/enable", "r") as fd:
...     fd.read()
...
'0\n'
>>> with open("/sys/kernel/config/target/core/user_0/rbd.disk_2/enable", "w") as fd:
...     fd.write("1\n")
...
2
FileNotFoundError: [Errno 2] No such file or directory

So this confused me... Please let me know if you have some idea about this, thanks...

Logs & Configs

vim /var/log/rbd-target-api/rbd-target-api.log

2022-01-20 02:53:11,911     INFO [lun.py:610:allocate()] - (LUN.allocate) created rbd/disk_2 successfully
2022-01-20 02:53:11,911    DEBUG [lun.py:649:allocate()] - Check the rbd image size matches the request
2022-01-20 02:53:11,912    DEBUG [lun.py:672:allocate()] - Begin processing LIO mapping
2022-01-20 02:53:11,912    DEBUG [lun.py:844:lio_stg_object()] - lio stg lookup failed Storage object user/rbd.disk_2 not found
2022-01-20 02:53:11,912     INFO [lun.py:855:add_dev_to_lio()] - (LUN.add_dev_to_lio) Adding image 'rbd/disk_2' to LIO backstore user:rbd
2022-01-20 02:53:11,912    DEBUG [lun.py:885:_add_dev_to_lio_user_rbd()] - control="max_data_area_mb=8,hw_max_sectors=1024"
2022-01-20 02:53:11,923    ERROR [lun.py:910:_add_dev_to_lio_user_rbd()] - failed to add rbd/disk_2 to LIO - error([Errno 2] No such file or directory)
2022-01-20 02:53:11,924    ERROR [rbd-target-api:1229:_disk()] - LUN alloc problem - failed to add rbd/disk_2 to LIO - error([Errno 2] No such file or directory)
2022-01-20 02:53:11,927     INFO [_internal.py:88:_log()] - ::1 - - [20/Jan/2022 02:53:11] "PUT /api/_disk/rbd/disk_2 HTTP/1.1" 500 -
2022-01-20 02:53:11,929    ERROR [rbd-target-api:2722:call_api()] - _disk change on localhost failed with 500
2022-01-20 02:53:11,932    DEBUG [rbd-target-api:2744:call_api()] - failed on mgt04. LUN allocation failure
2022-01-20 02:53:11,933     INFO [_internal.py:88:_log()] - ::1 - - [20/Jan/2022 02:53:11] "PUT /api/disk/rbd/disk_2 HTTP/1.1" 500 -

Environment

pip3 list
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
asn1crypto (0.24.0)
blinker (1.4)
ceph-iscsi (3.5)
certifi (2018.1.18)
chardet (3.0.4)
click (6.7)
colorama (0.3.7)
configshell-fb (1.1.29)
cryptography (2.1.4)
Flask (0.12.2)
idna (2.6)
itsdangerous (0.24)
Jinja2 (2.10)
keyring (10.6.0)
keyrings.alt (3.0)
MarkupSafe (1.0)
netifaces (0.10.4)
pip (9.0.1)
pycrypto (2.6.1)
pygobject (3.26.1)
pyinotify (0.9.6)
pyOpenSSL (17.5.0)
pyparsing (2.2.0)
pyudev (0.21.0)
pyxdg (0.25)
rados (2.0.0)
rbd (2.0.0)
requests (2.18.4)
rpdb (0.1.6)
rtslib-fb (2.1.74)
SecretStorage (2.3.1)
setuptools (39.0.1)
simplejson (3.13.2)
six (1.11.0)
targetcli-fb (2.1.54)
urllib3 (1.22)
urwid (2.0.1)
Werkzeug (0.14.1)
wheel (0.30.0)

os-release

cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
maurizio-lombardi commented 2 years ago

Hmm, interesting, it looks like the kernel returns -2 when trying to write to the file and this would a bug in the driver. Can you post here the output of "uname -a" ?

maurizio-lombardi commented 2 years ago

It's user-backed storage, so tcmu-runner may be involved, can you tell me the version of the tcmu-runner package?

TaylorTrz commented 2 years ago

Let me check...

# uname -a
Linux xxx 5.10.83 #3 SMP Fri Dec 3 11:13:00 CST 2021 x86_64 x86_64 x86_64 GNU/Linux

# tcmu-runner  -V
tcmu-runner 1.5.4

By the way, this demo was build on a privileged docker container, like this:

docker run -d -t \
    --network host \
    --volume=/root/tao:/root/tao \
    --volume=/lib/modules:/lib/modules \
    --env="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
    --privileged=true \
    --restart=always \
    --name=iscsi-gateway \
    iscsi-gateway:0.5 /sbin/init

Could this error message involved with container?

maurizio-lombardi commented 2 years ago

Could this error message involved with container?

I will contact the tcmu-runner devs and I will update

lxbsz commented 2 years ago

Could this error message involved with container?

I will contact the tcmu-runner devs and I will update

It does not fail in tcmu-runner, it fails in ceph-iscsi when calling the rstlib's UserBackedStorageObject in Line 945:

 932         try:
 933             # config string = rbd identifier / config_key (pool/image) /
 934             # optional osd timeout
 935             cfgstring = "rbd/{}/{};osd_op_timeout={}".format(self.pool,
 936                                                              self.image,                         
 937                                                              self.osd_op_timeout)
 938             if (settings.config.cephconf != '/etc/ceph/ceph.conf'):
 939                 cfgstring += ";conf={}".format(settings.config.cephconf)
 940   
 941             if (settings.config.cluster_client_name != 'client.admin'):
 942                 client_id = settings.config.cluster_client_name.split('.', 1)[1]
 943                 cfgstring += ";id={}".format(client_id)
 944   
 945             new_lun = UserBackedStorageObject(name=self.backstore_object_name,                                                                    
 946                                               config=cfgstring,                   
 947                                               size=self.size_bytes,               
 948                                               wwn=in_wwn, control=control_string) 
 949         except (RTSLibError, IOError) as err:
 950             self.error = True 
 951             self.error_msg = ("failed to add {} to LIO - "
 952                               "error({})".format(self.config_key, 
 953                                                  str(err)))
 954             self.logger.error(self.error_msg)   
 955             return None
TaylorTrz commented 2 years ago

Could this error message involved with container?

I will contact the tcmu-runner devs and I will update

It does not fail in tcmu-runner, it fails in ceph-iscsi when calling the rstlib's UserBackedStorageObject in Line 945:

 932         try:
 933             # config string = rbd identifier / config_key (pool/image) /
 934             # optional osd timeout
 935             cfgstring = "rbd/{}/{};osd_op_timeout={}".format(self.pool,
 936                                                              self.image,                         
 937                                                              self.osd_op_timeout)
 938             if (settings.config.cephconf != '/etc/ceph/ceph.conf'):
 939                 cfgstring += ";conf={}".format(settings.config.cephconf)
 940   
 941             if (settings.config.cluster_client_name != 'client.admin'):
 942                 client_id = settings.config.cluster_client_name.split('.', 1)[1]
 943                 cfgstring += ";id={}".format(client_id)
 944   
 945             new_lun = UserBackedStorageObject(name=self.backstore_object_name,                                                                    
 946                                               config=cfgstring,                   
 947                                               size=self.size_bytes,               
 948                                               wwn=in_wwn, control=control_string) 
 949         except (RTSLibError, IOError) as err:
 950             self.error = True 
 951             self.error_msg = ("failed to add {} to LIO - "
 952                               "error({})".format(self.config_key, 
 953                                                  str(err)))
 954             self.logger.error(self.error_msg)   
 955             return None

Exactly! This error was derived from this. And there is some updates... I tried to build ceph-iscsi demo environment on Centos 7.4, and the result is a success.

TaylorTrz commented 1 year ago

Problem solved