kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
527 stars 239 forks source link

Btrfs filesystem show JSON format support #761

Open jelly opened 3 months ago

jelly commented 3 months ago

Introduce JSON format support for btrfs subvolume show, there are two ways this information can be obtained either via an ioctl or via reading the block device. So I have taken the liberty of merging the print paths together in a re-usable function before adding JSON support.

I've chosen to format the JSON as I would have expected, an Object with "metadata" and the list of devices as part of that under device-list. See for example a single device volume:

{
  "__header": {
    "version": "1"
  },
  "filesystem-list": [
    {
      "label": "fedora",
      "uuid": "cece4dd8-6168-4c88-a4a8-f7c51ed4f82b",
      "total_devices": 1,
      "used": 3334180864,
      "device-list": [
        {
          "devid": 1,
          "size": 0,
          "used": 0,
          "path": "/dev/sda5"
        }
      ]
    }
  ]
}

There is one open issue, when you have an unmounted btrfs volume with multiple devices where one is missing it won't show up the json output:

{
  "__header": {
    "version": "1"
  },
WARNING: warning, device 2 is missing

  "filesystem-list": [
    {
      "label": "raid1",
      "uuid": "698b3250-9424-46c4-af61-372cc5468780",
      "total_devices": 2,
      "used": 638779392,
      "device-list": [
        {
          "devid": 1,
          "size": 0,
          "used": 0,
          "path": "/dev/sdb"
        }
      ]
    }
  ]
}

I could add a missing_devices property to the btrfs filesystem information but then that is inconsistent with the mounted missing device output (where it is a missing property in the device object).

Is it possible to show what device is missing for the unmounted case? So btrfs subvolume show would be consistent either way? (Seems device id is known, not sure about the path?).

Zygo commented 4 weeks ago

Is it possible to show what device is missing for the unmounted case? So btrfs subvolume show would be consistent either way?

Generally no, but there are a few exceptions. The ioctl and block methods build the device list from sources with different information availability.

The ioctl (mounted) method can simply read the chunk tree from the kernel and get all information about all devices except for path, which is filled in by searching /dev and matching up devuuids (there's some udev in between, but /dev + brute-force search is the ultimate source of the information). This can build a complete list of missing device IDs, UUIDs, and sizes (which are stored in the metadata) but not paths (which are missing by definition) or any data that requires the path as an input (such as device slack or rotational status). Consistency is expected because btrfs wouldn't mount the filesystem otherwise (the information could change while fi show is running, so inconsistency is still possible).

The block device method reads the superblock of every device available in /dev, which tells it how many devices to expect per filesystem and the devid and devuuid of every device that is present. No information is available about devices that are missing. The number of missing devices is the difference between the number of devices found and the number of devices expected. If the number of devices expected is lower than the largest device ID in the filesystem, then it's not possible to tell which device IDs are missing from the available information. If number of devices expected is equal to the largest device ID, then the missing devices are any devid numbers between 1 and the number of devices expected that are not found during the superblock search. If there's an existing device ID that is higher than num_devices then you know you don't know what the missing device IDs are. If the highest device ID found is smaller than num_devices but some devices are missing, then you don't have enough information to know whether some of the missing devices have higher device IDs than num_devices. Inconsistent data is also possible, e.g. a device is removed from a filesystem while offline, then reappears with a superblock and metadata that results in num_devices less than the number of devices present, or multiple values of num_devices for a single filesystem, or devices with the same fs uuid and dev id but different devuuid.

For both methods to behave like the ioctl case, you'd have to read the device tree from the metadata to get the device list (i.e. emulate the ioctl in userspace). If too many devices are missing or there's inconsistent superblocks present, reading the tree won't be possible, and the block device method is the only option.