rockstor / rockstor-core

Linux/BTRFS based Network Attached Storage(NAS)
http://rockstor.com/docs/contribute_section.html
GNU General Public License v3.0
558 stars 138 forks source link

Un special-case system drive btrfs-in-partition treatment #2824 #2830

Closed phillxnet closed 7 months ago

phillxnet commented 7 months ago

Removing legacy system drive special treatment that was originally intended solely to present a partitioned system pool member as if it was a whole disk: the only entity we understood prior to our data pool btrfs-in-partition capability. This normalises/simplifies our lowest level scan_disks() procedure.

Includes: Update all scan_disks() unittest to reflect:

Fixes #2824 Fixes #2749

--

Work in progress:

All osi tests now modified and all osi tests now pass again. But we have some higher level special-cases for the system drive that are to be addressed later in this draft PR. There also remains some likely straggler/strange/redundant logic in scan_disks(); but given all osi tests now pass, these lower-level clean-ups can be addressed after the whole-system viability of this approach is fully assessed: via higher level artefacts examination.

phillxnet commented 7 months ago

OSI tests

lbuildvm:/opt/rockstor/src/rockstor/system/tests # poetry run django-admin test -v 2 -p test_osi*
Found 20 test(s).
Skipping setup of unused database(s): default, smart_manager.
System check identified no issues (0 silenced).
test_get_byid_name_map (rockstor.system.tests.test_osi.OSITests.test_get_byid_name_map)
Test get_byid_name_map() for expected by-id device name mapping. ... ok
test_get_byid_name_map_no_byid_dir (rockstor.system.tests.test_osi.OSITests.test_get_byid_name_map_no_byid_dir)
Test get_byid_name_map() for non-existent by-id dir behaviour. ... ok
test_get_byid_name_map_prior_command_mock (rockstor.system.tests.test_osi.OSITests.test_get_byid_name_map_prior_command_mock)
Test get_byid_name_map() for prior mapping between canonical ... ok
test_get_dev_byid_name (rockstor.system.tests.test_osi.OSITests.test_get_dev_byid_name)
Test get_dev_byid_name() across a range of inputs. ... ok
test_get_dev_byid_name_no_devlinks (rockstor.system.tests.test_osi.OSITests.test_get_dev_byid_name_no_devlinks)
Test as yet un-observed circumstance of no DEVLINKS entry for: ... ok
test_get_dev_byid_name_node_not_found (rockstor.system.tests.test_osi.OSITests.test_get_dev_byid_name_node_not_found)
test get_dev_byid_name when supplied dev name not found ... ok
test_replace_pattern_inline (rockstor.system.tests.test_osi.OSITests.test_replace_pattern_inline)
This function is only used during initrock.establish_poetry_paths() for now ... ok
test_root_disk (rockstor.system.tests.test_osi.OSITests.test_root_disk)
Test root_disk() for expected function of identifying the base device name for the ... ok
test_run_command (rockstor.system.tests.test_osi.OSITests.test_run_command) ... ok
test_scan_disks_27_plus_disks_regression_issue (rockstor.system.tests.test_osi.OSITests.test_scan_disks_27_plus_disks_regression_issue)
Suspected minimum disk set to trigger /dev/sda /dev/sda[a-z] serial bug ... ok
test_scan_disks_btrfs_in_partition (rockstor.system.tests.test_osi.OSITests.test_scan_disks_btrfs_in_partition)
Test btrfs in partition on otherwise generic install. System disk sda ... ok
test_scan_disks_dell_perk_h710_md1220_36_disks (rockstor.system.tests.test_osi.OSITests.test_scan_disks_dell_perk_h710_md1220_36_disks)
Test scan_disks() with Direct attach storage shelf (Dell MD1220). ... ok
test_scan_disks_intel_bios_raid_data_disk (rockstor.system.tests.test_osi.OSITests.test_scan_disks_intel_bios_raid_data_disk)
Intel motherboard based firmware raid data disk identification. ... ok
test_scan_disks_intel_bios_raid_sys_disk (rockstor.system.tests.test_osi.OSITests.test_scan_disks_intel_bios_raid_sys_disk)
Intel motherboard based firmware raid sys disk install, Essentially ... ok
test_scan_disks_luks_on_bcache (rockstor.system.tests.test_osi.OSITests.test_scan_disks_luks_on_bcache)
Test scan_disks() across a variety of mocked lsblk output. ... ok
test_scan_disks_luks_sys_disk (rockstor.system.tests.test_osi.OSITests.test_scan_disks_luks_sys_disk)
Test to ensure scan_disks() correctly identifies the CentOS, default ... ok
test_scan_disks_mdraid_sys_disk (rockstor.system.tests.test_osi.OSITests.test_scan_disks_mdraid_sys_disk)
Test of scan_disks() on a system installed as per the: ... ok
test_scan_disks_nvme_sys_disk (rockstor.system.tests.test_osi.OSITests.test_scan_disks_nvme_sys_disk)
Post pr #1925 https://github.com/rockstor/rockstor-core/pull/1946 ... ok
test_scan_disks_root_miss_attribution (rockstor.system.tests.test_osi.OSITests.test_scan_disks_root_miss_attribution)
Miss-attribution of a data drive to the ROOT pool. ... ok
test_scan_disks_root_miss_attribution_min_reproducer (rockstor.system.tests.test_osi.OSITests.test_scan_disks_root_miss_attribution_min_reproducer)
Miss-attribution of a data drive to the ROOT pool. ... ok

----------------------------------------------------------------------
Ran 20 tests in 0.087s

OK

All tests

lbuildvm:/opt/rockstor/src/rockstor # poetry run django-admin test -v 2
...
----------------------------------------------------------------------
Ran 282 tests in 29.968s

OK
phillxnet commented 7 months ago

Building and installing an rpm form this branch and we have on first setup (with debug):

[06/Apr/2024 10:40:05] DEBUG [system.osi:344] Scan_disks() using: blk_dev_properties={'name': '/dev/sda3', 'model': None, 'serial': None, 'size': '24G', 'transport': None, 'vendor': None, 'hctl': None, 'type': 'part', 'fstype': 'btrfs', 'label': 'ROOT', 'uuid': '4d866326-2a23-4f01-b81f-4f9e3a811203'}
[06/Apr/2024 10:40:05] DEBUG [system.osi:423] (/dev/sda) parent of partition (/dev/sda3)
[06/Apr/2024 10:40:05] DEBUG [system.osi:506] -- Skipping btrfs partition entry -
[06/Apr/2024 10:40:05] DEBUG [system.osi:344] Scan_disks() using: blk_dev_properties={'name': '/dev/sdb', 'model': 'QEMU HARDDISK', 'serial': 'QM00003', 'size': '5G', 'transport': 'sata', 'vendor': 'ATA', 'hctl': '2:0:0:0', 'type': 'disk', 'fstype': 'btrfs', 'label': 'test-pool', 'uuid': 'e0a48f0a-0884-4b7e-b681-88e70823a3df'}
[06/Apr/2024 10:40:05] DEBUG [system.osi:344] Scan_disks() using: blk_dev_properties={'name': '/dev/sdc', 'model': 'QEMU HARDDISK', 'serial': 'QM00005', 'size': '5G', 'transport': 'sata', 'vendor': 'ATA', 'hctl': '3:0:0:0', 'type': 'disk', 'fstype': 'btrfs', 'label': '123', 'uuid': '5d96bfbc-47b9-4e5a-b64a-9088a7111bf7'}
[06/Apr/2024 10:40:05] DEBUG [storageadmin.views.disk:416] ++++ Creating special system pool db entry.

But on the next call our special case system pool update then fails:

[06/Apr/2024 10:40:09] DEBUG [system.osi:423] (/dev/sda) parent of partition (/dev/sda3)
[06/Apr/2024 10:40:09] DEBUG [system.osi:506] -- Skipping btrfs partition entry -
[06/Apr/2024 10:40:09] DEBUG [system.osi:344] Scan_disks() using: blk_dev_properties={'name': '/dev/sdb', 'model': 'QEMU HARDDISK', 'serial': 'QM00003', 'size': '5G', 'transport': 'sata', 'vendor': 'ATA', 'hctl': '2:0:0:0', 'type': 'disk', 'fstype': 'btrfs', 'label': 'test-pool', 'uuid': 'e0a48f0a-0884-4b7e-b681-88e70823a3df'}
[06/Apr/2024 10:40:09] DEBUG [system.osi:344] Scan_disks() using: blk_dev_properties={'name': '/dev/sdc', 'model': 'QEMU HARDDISK', 'serial': 'QM00005', 'size': '5G', 'transport': 'sata', 'vendor': 'ATA', 'hctl': '3:0:0:0', 'type': 'disk', 'fstype': 'btrfs', 'label': '123', 'uuid': '5d96bfbc-47b9-4e5a-b64a-9088a7111bf7'}
[06/Apr/2024 10:40:09] ERROR [storageadmin.views.command:135] Exception while refreshing state for Pool(ROOT). Moving on: '/dev/sda'
[06/Apr/2024 10:40:09] ERROR [storageadmin.views.command:139] '/dev/sda'
Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/storageadmin/views/command.py", line 123, in _refresh_pool_state
    pool_info = dev_pool_info[dev_tmp_name]
                ~~~~~~~~~~~~~^^^^^^^^^^^^^^
KeyError: '/dev/sda'
[06/Apr/2024 10:40:09] DEBUG [fs.btrfs:1005] Skipping excluded subvol: name=(@).
[06/Apr/2024 10:40:09] DEBUG [fs.btrfs:1005] Skipping excluded subvol: name=(.snapshots).
[06/Apr/2024 10:40:09] DEBUG [fs.btrfs:1005] Skipping excluded subvol: name=(.snapshots/1/snapshot).

Our DB disk.role entry for the system disk, after the above, is:

{"root": "btrfs", "partitions": {"ata-QEMU_HARDDISK_QM00001-part3": "btrfs"}}

And these are both scan_disks() assigned roles:

https://github.com/rockstor/rockstor-core/blob/6c6acb437a8f19108419390ea3df3563f8d2689a/src/rockstor/storageadmin/views/disk.py#L78-L92

And we have a successful by-id transition for the partition name:

https://github.com/rockstor/rockstor-core/blob/6c6acb437a8f19108419390ea3df3563f8d2689a/src/rockstor/storageadmin/views/disk.py#L341-L352

phillxnet commented 7 months ago

More logging on command.py's pool refresh call to get_dev_pool_info() returns our expected /dev/sda3

[06/Apr/2024 12:12:33] DEBUG [fs.btrfs:519] get_dev_pool_info() returning {'/dev/sda3': DevPoolInfo(devid=1, size=25128940.0, allocated=8454144.0, uuid='4d866326-2a23-4f01-b81f-4f9e3a811203', label='ROOT'), '/dev/sdb': DevPoolInfo(devid=1, size=5242880.0, allocated=3735552.0, uuid='e0a48f0a-0884-4b7e-b681-88e70823a3df', label='test-pool'), '/dev/sdc': DevPoolInfo(devid=1, size=5242880.0, allocated=1114112.0, uuid='5d96bfbc-47b9-4e5a-b64a-9088a7111bf7', label='123')}

[06/Apr/2024 12:12:33] ERROR [storageadmin.views.command:135] Exception while refreshing state for Pool(ROOT). Moving on: '/dev/sda'
[06/Apr/2024 12:12:33] ERROR [storageadmin.views.command:139] '/dev/sda'
Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/storageadmin/views/command.py", line 123, in _refresh_pool_state
    pool_info = dev_pool_info[dev_tmp_name]
                ~~~~~~~~~~~~~^^^^^^^^^^^^^^
KeyError: '/dev/sda'
phillxnet commented 7 months ago

We now have our parent (rather than special part) system drive reference:

un-special-system-disk

And the automated addition, for system disk only, of a redirect role (required for our generic btrfs-in-partition)

{"redirect": "ata-QEMU_HARDDISK_QM00001-part3", "root": "btrfs", "partitions": {"ata-QEMU_HARDDISK_QM00001-part3": "btrfs"}}

Code uncommitted as still very much in development.

Moving to adding system drive import capability, as give we no longer special case the system disk/pool (mostly for latter), it defaults as un-imported. Which should speed up the majority of installs..

phillxnet commented 7 months ago

We now have the following behaviour in this branch (fresh installs only)

Import-on-system-disk

Post a working import (Advanced users only) we have:

Imported-system-disk

N.B. and to keep things simple from a user/development perspective: we also have no role editing on the new un special-case system drive (so a little special case still :) ). The system drive and pool are necessarily special to an extent as they are the only ones we do not manage from the ground-up. Our installer, via fstab, manages the setup/config of the system drive and pool: what we are after here is for our interoperability with these to add as little complexity/confusion as possible to our data pool/s management code. And to facilitate what intervention/presentation we can on the btrfs parts of the system drive/pool.

I'm quite liking the default un-imported / non Rockstor managed state of the system drive/pool by default myself. An artefact of the un special-case approach here. This gives the vast majority of installs less over-all complexity from a user perspective. And helps folks avoid inadvertently mixing system and data storage. We have always encouraged this separation of concerns, but with no system pool by default; we are likely more beginner friendly out-of-the-box. But with the import option we maintain the prior flexibility.

phillxnet commented 7 months ago

Outstanding issues:

Otherwise the resulting "Create Share" button/page results in a spinner-waiting icon overlaid: as it has no Pools to populate it's "Pool" drop-down.

In contrast: Stroage - Pools: is as expected "No Pools have been created 'Create Pool` " button is presented.

phillxnet commented 7 months ago

We now have, in this PR branch, the following on the Pool & Share overview pages:

Pools Overview

No managed Pool - disks available

Pools-page-no-pools-but-disks-available

Managed Pool - disks available

Pools-page-with-pool-and-disks-available

Managed Pool - disks unavailable

Pools-page-with-pool-but-no-disks-available

Share Overview

No Shares - no managed Pool

Share-page-no-shares-or-pools

No Shares

(unchanged from prior to this PR) Share-page-no-shares

phillxnet commented 7 months ago

And we now have consistency re the Web-UI on the disk page re mapping indication to a Pool:

Disks-page-all-mapped-to-a-pool

phillxnet commented 7 months ago

Caveat

When importing the 'ROOT' labelled system pool on a custom install: i.e. not from our installers, where the root system is also not a boot to snapshot: we get the following fail/error:

 Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/storageadmin/views/disk.py", line 789, in _btrfs_disk_import
    import_shares(po, request)
  File "/opt/rockstor/src/rockstor/storageadmin/views/share_helpers.py", line 95, in import_shares
    shares_in_pool = shares_info(pool)
                     ^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 1010, in shares_info
    s_name, writable, is_clone = parse_snap_details(
                                 ^^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 1055, in parse_snap_details
    writable = not get_property(full_snap_path, "ro")
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 2346, in get_property
    o, e, rc = run_command(cmd)
               ^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/system/osi.py", line 293, in run_command
    raise CommandException(cmd, out, err, rc)
system.exceptions.CommandException: Error running a command. cmd = /usr/sbin/btrfs property get /mnt2/ROOT/.snapshots/1/snapshot ro. rc = 1. stdout = ['']. stderr = ['ERROR: failed to open /mnt2/ROOT/.snapshots/1/snapshot: No such file or directory', 'ERROR: failed to detect object type: No such file or directory', '']

As this has not been seen (yet) on a system resulting from our official installers, it will not be addressed. Noting here in case there is a miss-attribution of this outstanding issue to a non-boot-to-snapshot configuration. All our installers (since our move to a Built on openSUSE arrangement): have been pre-configured to be boot-to-snapshot for they system pool: irrespective of system drive size.


[EDIT] This has now been seen in boot-to-snapshot installs on the latest version of the development in this Draft PR. And has been split out into the following issue:

ROOT pool import failure (Advanced Users Only) #2832

phillxnet commented 7 months ago

4.6.1-0 (Uses Leap 15.4 but originally installed from a 15.3 based official stable installer) showed the following on the Disks overview page prior to a Web-UI update:

4.6.1-0 Pre PR

(dup from 15.3 to 15.4) 4 6 1-0-disks-page

Post PR

Web-UI updated to a 5.0.8-2830 testing rpm (unpublished)

4 6 1-0_to_5 0 8-2830-Web-UI_update-disks-page

So we successfully transition from our last Stable with regard to the changes in this PR.

[EDIT] After a required Crt + Shift + R (as suggested on update page) we now have the additional tags as well: give they relate to client side changes in the template:

4 6 1-0_to_5 0 8-2830-Web-UI_update-disks-page-after-browser-refresh