iXsystems / cinder

Other
35 stars 18 forks source link

Fail gracefully if target/lun count is above 256 for CORE #27

Open william-gr opened 2 years ago

william-gr commented 2 years ago

It depends on kern.cam.ctl.max_ports tunable.

databunnysg commented 2 years ago

Noted! Start investigation!

YiHuangDB commented 2 years ago

A first round of assessment: Set kern.cam.ctl.max_luns 4 kern.cam.ctl.max_ports 4 and ctl_load YES in truenas core 12 tunable, I can see those variables updated after reboot. I can create more than 5 volumes using cinder driver. When not change freebsd default value kern.cam.ctl.max_ports 256, I can create >256 volumes using cinder driver. And use iscsiadm I can attach >5 iscsi target sessions from client os, and >256 iscsi target sessions successfully. I am further looking into any negative effects from openstack operations for kern.cam.ctl.max_ports and kern.cam.ctl.max_luns.

YiHuangDB commented 1 year ago

Here is summary of existing TrueNAS cinder driver behaviors under different kern.cam.ctl.max_ports/kern.cam.ctl.max_luns configurations:

  1. When Openstack TrueNAS volume total number > kern.cam.ctl.max_ports/kern.cam.ctl.max_luns, openstack create or delete TrueNAS volume successful.
  2. When Openstack TrueNAS attached volumes total number > kern.cam.ctl.max_ports or kern.cam.ctl.max_luns, openstack attach or detach TrueNAS volume action timeout then failed.

Looking for further resolution for this issue.

YiHuangDB commented 1 year ago

Some update:

The actual attach/detach volume action timeout is happening in upstream cinder code here: https://github.com/openstack/cinder/blob/392e27aa950374041fbfc827a160f835fd438e70/cinder/volume/driver.py#L1129 And then further os-brick upstream actual exception throw here: https://github.com/openstack/os-brick/blob/a519dd8d07a65896b6151087c6b38b5294129bb6/os_brick/initiator/connectors/iscsi.py#L505

The possible solution without impact upstream code is to check cinder attached volume count < kern.cam.ctl.max_ports/kern.cam.ctl.max_luns here before return connection meta to upstream and fail gracefully: https://github.com/iXsystems/cinder/blob/28e0d7bc60814b8affdfa09d655be6c15e2db225/driver/ixsystems/iscsi.py#L144

Working on actual code fix.