ublk-org / ublksrv

ublk: userspace block device driver
MIT License
149 stars 53 forks source link
block coroutine iouring linux nbd qcow2 storage ublk userspace

============================ Userspace block driver(ublk)

Introduction

This is the userspace daemon part(ublksrv) of the ublk framework, the other part is ublk driver [#userspace]_ which supports multiple queue.

The two parts communicate by io_uring's IORING_OP_URING_CMD with one per-queue shared cmd buffer for storing io command, and the buffer is read only for ublksrv, each io command can be indexed by io request tag directly, and the command is written by ublk driver, and read by ublksrv after getting notification from ublk driver.

For example, when one READ io request is submitted to ublk block driver, ublk driver stores the io command into cmd buffer first, then completes one IORING_OP_URING_CMD for notifying ublksrv, and the URING_CMD is issued to ublk driver beforehand by ublksrv for getting notification of any new io request, and each URING_CMD is associated with one io request by tag, so depth for URING_CMD is same with queue depth of ublk block device.

After ublksrv gets the io command, it translates and handles the ublk io request, such as, for the ublk-loop target, ublksrv translates the request into same request on another file or disk, like the kernel loop block driver. In ublksrv's implementation, the io is still handled by io_uring, and share same ring with IORING_OP_URING_CMD command. When the target io request is done, the same IORING_OP_URING_CMD is issued to ublk driver for both committing io request result and getting future notification of new io request.

So far, the ublk driver needs to copy io request pages into userspace buffer (pages) first for write before notifying the request to ublksrv, and copy userspace buffer(pages) to the io request pages after ublksrv handles READ. Also looks linux-mm can't support zero copy for this case yet. [#zerocopy]

More ublk targets will be added with this framework in future even though only ublk-loop and ublk-null are implemented now.

libublksrv is also generated, and it helps to integrate ublk into existed project. One example of demo_null is provided for how to make a ublk device over libublksrv.

Quick start

how to build ublksrv:

.. code-block:: console

autoreconf -i ./configure #pkg-config and libtool is usually needed make

note: './configure' requires liburing 2.2 package installed, if liburing 2.2 isn't available in your distribution, please configure via the following command, or refer to build_with_liburing_src [#build_with_liburingsrc]

.. code-block:: console

PKG_CONFIG_PATH=${LIBURING_DIR} \ ./configure \ CFLAGS="-I${LIBURING_DIR}/src/include" \ CXXFLAGS="-I${LIBURING_DIR}/src/include" \ LDFLAGS="-L${LIBURING_DIR}/src"

and LIBURING_DIR points to directory of liburing source code, and liburing needs to be built before running above commands. Also IORING_SETUP_SQE128 has to be supported in the liburing source.

c++20 is required for building ublk utility, but libublksrv and demo_null.c & demo_event.c can be built independently:

help

add one ublk-null disk

add one ublk-loop disk

or

remove one ublk disk

list ublk devices

unprivileged mode

Typical use case is container [#stefancontainer] in which user can manage its own devices not exposed to other containers.

At default, controlling ublk device needs privileged user, since /dev/ublk-control is permitted for administrator only, and this is called privileged mode.

For unprivilege mode, /dev/ublk-control needs to be allowed for all users, so the following udev rule need to be added:

KERNEL=="ublk-control", MODE="0666", OPTIONS+="static_node=ublk-control"

Also when new ublk device is added, we need ublk to change device ownership to the device's real owner, so the following rules are needed: ::

KERNEL=="ublkc*",RUN+="ublk_chown.sh %k"
KERNEL=="ublkb*",RUN+="ublk_chown.sh %k"

ublk_chown.sh can be found under utils/ too.

utils/ublk_dev.rules includes the above rules.

With the above two administrator changes, unprivileged user can create/delete/list/use ublk device, also anyone which isn't permitted can't access and control this ublk devices(ublkc/ublkb)

Unprivileged user can pass '--unprevileged' to 'ublk add' for creating unprivileged ublk device, then the created ublk device is only available for the owner and administrator.

use unprivileged ublk in docker

ublk_chown_docker.sh can be found under utils/.

.. code-block:: console

docker run \ --name fedora \ --hostname=ublk-docker.example.com \ --device=/dev/ublk-control \ --device-cgroup-rule='a : rmw' \ --tmpfs /tmp \ --tmpfs /run \ --volume /sys/fs/cgroup:/sys/fs/cgroup:ro \ -ti \ fedora:38

.. code-block:: console

run the following commands inside the above container

dnf install -y git libtool automake autoconf g++ liburing-devel git clone https://github.com/ming1/ubdsrv.git cd ubdsrv autoreconf -i&& ./configure&& make -j 4&& make install

.. code-block:: console

docker exec -u 1001:1001 -ti fedora /bin/bash

.. code-block:: console

run the following commands inside the above container

bash-5.2$ ublk add -t null --unprivileged dev id 0: nr_hw_queues 1 queue_depth 128 block size 512 dev_capacity 524288000 max rq size 524288 daemon pid 178 flags 0x62 state LIVE ublkc: 237:0 ublkb: 259:1 owner: 1001:1001 queue 0: tid 179 affinity(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ) target {"dev_size":268435456000,"name":"null","type":0}

bash-5.2$ ls -l /dev/ublk* crw-rw-rw-. 1 root root 10, 123 May 1 04:35 /dev/ublk-control brwx------. 1 1001 1001 259, 1 May 1 04:36 /dev/ublkb0 crwx------. 1 1001 1001 237, 0 May 1 04:36 /dev/ublkc0

bash-5.2$ ublk del -n 0 bash-5.2$ ls -l /dev/ublk* crw-rw-rw-. 1 root root 10, 123 May 1 04:35 /dev/ublk-control

test

run all built tests

make test T=all

run test group

make test T=null

make test T=loop

make test T=generic

run single test

make test T=generic/001

make test T=null/001

make test T=loop/001 ...

run specified tests or test groups

make test T=generic:loop/001:null

Debug

ublksrv is running as one daemon process, so most of debug messages won't be shown in terminal. If any issue is observed, please collect log via command of "journalctl | grep ublksrvd"

./configure --enable-debug can build a debug version of ublk which dumps lots of runtime debug messages, and can't be used in production environment, should be for debug purpose only. For debug version of ublksrv, 'ublk add --debug_mask=0x{MASK}' can control which kind of debug log dumped, see UBLK_DBG_* defined in include/ublksrv_utils.h for each kind of debug log.

libublksrv API doc

API is documented in include/ublksrv.h, and doxygen doc can be generated by running 'make doxygen_doc', the generated html docs are in doc/html.

Contributing

Any kind of contribution is welcome!

Development is done over github.

Todo:

libublk

Move libublksrv out of ublksrv project, and make it as one standalone repo and name it as libublk.

It is planned to do it when ublk driver UAPI changes(feature addition) is slow down.

License

nlohmann(include/nlohmann/json.hpp) is from [#nlohmann]_, which is covered by MIT license.

The library functions (all code in lib/ directory and include/ublksrv.h) are covered by dual licensed LGPL and MIT, see COPYING.LGPL and LICENSE.

All other source code are covered by dual licensed GPL and MIT, see COPYING and LICENSE.

References

.. [#ublk_driver] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/block/ublk_drv.c?h=v6.0 .. [#zero_copy] https://lore.kernel.org/all/20220318095531.15479-1-xiaoguang.wang@linux.alibaba.com/ .. [#nlohmann] https://github.com/nlohmann/json .. [#build_with_liburing_src] https://github.com/ming1/ubdsrv/blob/master/build_with_liburing_src .. [#stefan_container] https://lore.kernel.org/linux-block/YoOr6jBfgVm8GvWg@stefanha-x1.localdomain/