linux-system-roles / storage

Ansible role for linux storage management
https://linux-system-roles.github.io/storage/
MIT License
99 stars 55 forks source link

RFE: Support fs_type: gfs2 #417

Closed andyprice closed 5 months ago

andyprice commented 5 months ago

As https://github.com/linux-system-roles/storage/issues/341 has been resolved, it would be useful to be able to create a gfs2 filesystem with the storage role. The current difficulty for the gfs2 role is that the storage role does not allow creating a volume without a filesystem, so the gfs2 role would have to allow an xfs filesystem to be created and then overwrite it with a gfs2 filesystem, which is not ideal, particularly for idempotency.

So it seems the best option* is for the storage role to create the filesystem with fs_type: gfs2 and accept the special options that mkfs.gfs2 requires. The existing fs_create_options option should be sufficient for this, e.g. fs_create_options: -j 2 -t mycluster:myfs

For single-node, not-clustered mount testing (only) you could use fs_create_options: -p lock_nolock

I don't know if blivet would need changes to support this, but there appears to be at least a stub of support for gfs2 already: https://github.com/storaged-project/blivet/blob/master/blivet/formats/fs.py#L1044

* If the storage role does not manage idempotency for filesystem creation, it would be preferable (to avoid data loss) for the gfs2 role if the storage role instead allowed fs_type: none so that the gfs2 role can manage it itself.

richm commented 5 months ago

What about https://github.com/linux-system-roles/storage/pull/400 - this allows you to create volumes without a filesystem?

andyprice commented 5 months ago

What about #400 - this allows you to create volumes without a filesystem?

Ah I didn't hear about #400, I think I need to update. That would suffice but it would still be cleaner to be able to create the fs at the same time as creating the shared volumes, if possible.

richm commented 5 months ago

What about #400 - this allows you to create volumes without a filesystem?

Ah I didn't hear about #400, I think I need to update. That would suffice but it would still be cleaner to be able to create the fs at the same time as creating the shared volumes, if possible.

Agreed - sounds like it would make sense to have fs_type: gfs2 and be able to pass gfs2 specific options in the storage role.

vojtechtrefny commented 5 months ago

I don't know if blivet would need changes to support this, but there appears to be at least a stub of support for gfs2 already

Looks like all we need to do is set the gfs2 flag to mark the filesystem as supported, this can be done from the role. I'll check this a bit more and if this is all that's needed for, I should be able to prepare a fix this week.

richm commented 5 months ago

@andyprice will this suffice? https://github.com/linux-system-roles/storage/pull/418

andyprice commented 5 months ago

@andyprice will this suffice? #418

Using:

  - name: Create shared volume group
    include_role:
      name: linux-system-roles.storage
    vars:
      storage_pools:
        - name: "{{ fs.vg }}"
          disks: "{{ fs.pvs }}"
          type: lvm 
          shared: true
          state: present
          volumes:
            - name: "{{ fs.lv }}"
              size: "{{ fs.lv_size }}"
              fs_type: gfs2
              fs_create_options: "-D -t {{ gfs2_cluster_name }}:{{ fs.name }} -j {{ fs.journals | default(gfs2_default_journals) }} -U {{ fs.uuid }}"
    run_once: true

It succeeds on the first run but running a second time results in blivet.errors.FSError: no application to set label for filesystem gfs2. The full ugly error message:

Failed message...
  fedora1 failed | msg: {'failed': True, 'module_stdout': 'failed to load module nvdimm: libbd_nvdimm.so.3: cannot open shared object file: No such file or directory\r\nTraceback (most recent call last):\r\n  File "/root/.ansible/tmp/ansible-tmp-1706025064.9075828-37501-216582917009709/AnsiballZ_blivet.py", line 107, in <module>\r\n    _ansiballz_main()\r\n  File "/root/.ansible/tmp/ansible-tmp-1706025064.9075828-37501-216582917009709/AnsiballZ_blivet.py", line 99, in _ansiballz_main\r\n    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\r\n  File "/root/.ansible/tmp/ansible-tmp-1706025064.9075828-37501-216582917009709/AnsiballZ_blivet.py", line 47, in invoke_module\r\n    runpy.run_module(mod_name=\'ansible.modules.blivet\', init_globals=dict(_module_fqn=\'ansible.modules.blivet\', _modlib_path=modlib_path),\r\n  File "<frozen runpy>", line 226, in run_module\r\n  File "<frozen runpy>", line 98, in _run_module_code\r\n  File "<frozen runpy>", line 88, in _run_code\r\n  File "/tmp/ansible_blivet_payload_wwoceeiq/ansible_blivet_payload.zip/ansible/modules/blivet.py", line 2307, in <module>\r\n  File "/tmp/ansible_blivet_payload_wwoceeiq/ansible_blivet_payload.zip/ansible/modules/blivet.py", line 2303, in main\r\n  File "/tmp/ansible_blivet_payload_wwoceeiq/ansible_blivet_payload.zip/ansible/modules/blivet.py", line 2255, in run_module\r\n  File "/tmp/ansible_blivet_payload_wwoceeiq/ansible_blivet_payload.zip/ansible/modules/blivet.py", line 1849, in manage_pool\r\n  File "/tmp/ansible_blivet_payload_wwoceeiq/ansible_blivet_payload.zip/ansible/modules/blivet.py", line 1590, in manage\r\n  File "/tmp/ansible_blivet_payload_wwoceeiq/ansible_blivet_payload.zip/ansible/modules/blivet.py", line 1561, in _manage_volumes\r\n  File "/tmp/ansible_blivet_payload_wwoceeiq/ansible_blivet_payload.zip/ansible/modules/blivet.py", line 872, in manage\r\n  File "/tmp/ansible_blivet_payload_wwoceeiq/ansible_blivet_payload.zip/ansible/modules/blivet.py", line 834, in _reformat\r\n  File "/usr/lib/python3.12/site-packages/blivet/threads.py", line 53, in run_with_lock\r\n    return m(*args, **kwargs)\r\n           ^^^^^^^^^^^^^^^^^^\r\n  File "/usr/lib/python3.12/site-packages/blivet/deviceaction.py", line 1106, in __init__\r\n    self._execute(dry_run=True)\r\n  File "/usr/lib/python3.12/site-packages/blivet/threads.py", line 53, in run_with_lock\r\n    return m(*args, **kwargs)\r\n           ^^^^^^^^^^^^^^^^^^\r\n  File "/usr/lib/python3.12/site-packages/blivet/formats/fs.py", line 678, in write_label\r\n    raise FSError("no application to set label for filesystem %s" % self.type)\r\nblivet.errors.FSError: no application to set label for filesystem gfs2\r\n', 'module_stderr': 'Shared connection to fedora1 closed.\r\n', 'msg': 'MODULE FAILURE\nSee stdout/stderr for the exact error', 'rc': 1, '_ansible_no_log': None, 'changed': False}
vojtechtrefny commented 5 months ago

The problem is that -t sets label for the GFS2 filesystem and during the second run we see the empty fs_label attribute and a filesystem with label so we try to remove the label which is not supported by blivet. This means we'll also need a change in blivet to support labels for GFS2 and in then instead of using -t in fs_create_options you'll need to use fs_label: {{ gfs2_cluster_name }}:{{ fs.name }}.

andyprice commented 5 months ago

Adding fs_label: "{{ gfs2_cluster_name }}:{{ fs.name }}" works. Thanks. (The -t option is required in the supported case but they seem to work fine together). Can #418 be merged without the label support in blivet?

vojtechtrefny commented 5 months ago

Interesting, I'd expected that using fs_label now would fail and tell you that we don't support setting label on GFS2 (because blivet doesn't support that). I'd like to fix blivet first before merging #418 so that using the fs_label attribute in the role would set the -t option in blivet and not having to duplicate it. It will be a small change in blivet and we should be able to get it to RHEL relatively quickly.

andyprice commented 5 months ago

So the "label" that gfs2 presents is the locking table name that it uses for dlm locking, it's set with mkfs.gfs2's -t option. Perhaps on the second run blivet knows that it doesn't need to modify the label because it matches the -t option that the fs was created with, and that's why there's no error?

vojtechtrefny commented 5 months ago

Yes, I understand, but it should fail on the first run now, because you are telling the role to create GFS2 with a label and we don't know how to do that. It will be created with that label because of the -t option, but blivet doesn't know that and should return error.

vojtechtrefny commented 5 months ago

@dwlehman actually pointed out the problem is different -- the role shouldn't try to change the label when running for the second time. I've updated the PR with a second fix for the logic around default value for fs_label when not set. The test now runs ok with fs_create_options: -j 2 -t rhel9-1node:myfs set so hopefully the new version now works for you as well. sorry for the noise.

andyprice commented 5 months ago

With the latest changes in https://github.com/linux-system-roles/storage/pull/418 I can run my test, with fs_label removed, multiple times with no errors. Thanks.