open-iscsi / tcmu-runner

A daemon that handles the userspace side of the LIO TCM-User backstore.
Apache License 2.0
190 stars 148 forks source link

tcmur_device: add priv lock support #667

Open lxbsz opened 3 years ago

lxbsz commented 3 years ago

When the tcmu-runner detect that the lock is lost, it will try to queue a work event to reopen the image and at the same time queue a work event to update the service status. While the reopen is not atomic, and there has a gap between image close and image open, during which the rbd image's state resource will be released and if the update status event is fired, we will hit the crash bug.

This commit will add one rdev->priv_lock to protect the private data in rdev struct. For the service status updating code just skip it if it's in the reopen gap. And for all the other IOs just return EBUSY to let the client try it again.

Signed-off-by: Xiubo Li xiubli@redhat.com

idryomov commented 3 years ago

I would suggest splitting unrelated bug fixes ("rbd: fix use-after-free of addr", "rbd: fix memory leak when fails to get the address" and "rbd: fix and add more debug logs") into a separate PR.

lxbsz commented 3 years ago

I'm confused about the scope of the issue. The description says that the problem is that the status update handler could encounter a NULL state but the PR goes on to change all handlers, including I/O handers such as read and write. Would tcmu-runner core really initiate I/O on a closed image?

Yeah, it's possible.

Could we instead look at something like TCMUR_DEV_FLAG_IS_OPEN? Wrapping all handlers with rdev->priv_lock seems too heavyweight to me.

Let me check it more about this and have a try.

I would suggest splitting unrelated bug fixes ("rbd: fix use-after-free of addr", "rbd: fix memory leak when fails to get the address" and "rbd: fix and add more debug logs") into a separate PR.

Done.

lxbsz commented 3 years ago

@idryomov Please take a look, thanks.

idryomov commented 3 years ago

I think I'm still missing something. I'm going to ignore the renames for the moment and speak in terms of what is currently in master.

From my reading of the description, the problem is that ->report_event() and __tcmu_reopen_dev() can race and ->report_event() might crash on a momentarily closed device. Now ->report_event() is currently called under state_lock and __tcmu_reopen_dev() clears TCMUR_DEV_FLAG_IS_OPEN under state_lock before proceeding with closing. Wouldn't it be sufficient to just add a TCMUR_DEV_FLAG_IS_OPEN check to __tcmu_report_event()? Why TCMUR_DEV_FLAG_REPORTING_EVENT and a new condition variable are necessary?

lxbsz commented 3 years ago

I think I'm still missing something. I'm going to ignore the renames for the moment and speak in terms of what is currently in master.

From my reading of the description, the problem is that ->report_event() and __tcmu_reopen_dev() can race and ->report_event() might crash on a momentarily closed device. Now ->report_event() is currently called under state_lock and __tcmu_reopen_dev() clears TCMUR_DEV_FLAG_IS_OPEN under state_lock before proceeding with closing. Wouldn't it be sufficient to just add a TCMUR_DEV_FLAG_IS_OPEN check to __tcmu_report_event()? Why TCMUR_DEV_FLAG_REPORTING_EVENT and a new condition variable are necessary?

The reopen and event report will be run in two different threads.

The reopen will be split into handler->close() and handler->open() without holding the state_lock, that means the reopen is none atomic.

In case that just after the __tcmu_report_event() checked that the TCMUR_DEV_FLAG_IS_OPEN flag is set and then it will try to access to the rbd_state private data to report the events to ceph cluster. While at the same time the reopen thread could be fired, which will clear the TCMUR_DEV_FLAG_IS_OPEN flag and release the rbd_state private data in handler->close().

The use-after-free bug still exists...

We need to let the reopen thread wait a bit to be sure that the event report thread has finished.

lxbsz commented 3 years ago

Run the following test for 2 hours, worked fine for me.

[root@client ~]# lsblk 
NAME        MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda           8:0    0   25G  0 disk  
├─sda1        8:1    0    1G  0 part  /boot
└─sda2        8:2    0   24G  0 part  
  ├─cl-root 253:0    0 21.9G  0 lvm   /
  └─cl-swap 253:1    0  2.1G  0 lvm   [SWAP]
sdb           8:16   0    1M  0 disk  
└─mpathd    253:2    0    1M  0 mpath 
sdc           8:32   0    1M  0 disk  
└─mpathd    253:2    0    1M  0 mpath 
sr0          11:0    1  8.6G  0 rom   

# while [ 1 ]; do dd if=/dev/zero of=/dev/sdb bs=1K count=1024; sleep 1; dd if=/dev/zero of=/dev/sdc bs=1K count=1024; done
idryomov commented 3 years ago

The reopen and event report will be run in two different threads.

The reopen will be split into handler->close() and handler->open() without holding the state_lock, that means the reopen is none atomic.

In case that just after the __tcmu_report_event() checked that the TCMUR_DEV_FLAG_IS_OPEN flag is set and then it will try to access to the rbd_state private data to report the events to ceph cluster. While at the same time the reopen thread could be fired, which will clear the TCMUR_DEV_FLAG_IS_OPEN flag and release the rbd_state private data in handler->close().

In master, state_lock is held across ->report_event() so if while reopen is indeed not atomic, event report is. __tcmu_reopen_dev() would not be able to clear TCMUR_DEV_FLAG_IS_OPEN and close the image while ->report_event() is executing.

You are removing state_lock protection from ->report_event() though, saying that it (now renamed to rdev_lock) "should always be released when calling the handler's hooks". Can you elaborate on why it is not OK to hold it across ->report_event() as it is currently done?

lxbsz commented 3 years ago

The reopen and event report will be run in two different threads. The reopen will be split into handler->close() and handler->open() without holding the state_lock, that means the reopen is none atomic. In case that just after the __tcmu_report_event() checked that the TCMUR_DEV_FLAG_IS_OPEN flag is set and then it will try to access to the rbd_state private data to report the events to ceph cluster. While at the same time the reopen thread could be fired, which will clear the TCMUR_DEV_FLAG_IS_OPEN flag and release the rbd_state private data in handler->close().

In master, state_lock is held across ->report_event() so if while reopen is indeed not atomic, event report is. __tcmu_reopen_dev() would not be able to clear TCMUR_DEV_FLAG_IS_OPEN and close the image while ->report_event() is executing.

You are removing state_lock protection from ->report_event() though, saying that it (now renamed to rdev_lock) "should always be released when calling the handler's hooks". Can you elaborate on why it is not OK to hold it across ->report_event() as it is currently done?

As I remembered long time ago as discussed, the rule is that the state_lock should be only used in the libtcmu, and shouldn't be used in the handlers. And when calling any handler hook we should release the state_lock, because there may have third part handlers could call libtcmu's helpers, which will be possibly acquire the state_lock again, potentially will introduce dead lock bug.

Or possibly in the handler's hooks it will sleep, so holding the state_lock is not a good idea.

The current code in master is buggy when begin to support event report feature.