Align Ceph deployment to OSP 17

fmount commented 2 years ago

This patch adds a new Ceph role which is responsible to execute a set of tasks and deploy the cluster in the standalone scenario. This works aligns dev-install to the new approach introduced in Wallaby/Zed, where the Ceph bootstrap is not part of the overcloud deploy anymore.

In particular, as per [1], Ceph expects to see:

an already provisioned metal
an already provisioned storage_network
reserved VIPs (if needed)

There are still actions happening during the overcloud deployment (pool creation, ganesha deployment) to finalize the cloud, but the initial bootstrap process has been moved to a set of overcloud ceph related commands, executed through the related tripleo.operator.

Note that a similar approach has been recently introduced in oooq [2].

[1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html#deployed-ceph-workflow [2] https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/standalone/tasks/ceph-install.yml

Signed-off-by: Francesco Pantano fpantano@redhat.com

fmount commented 2 years ago

@gouthampacha FYI

mdbooth commented 2 years ago

This looks awesome. Does it still work for 16.2?

fmount commented 2 years ago

This looks awesome. Does it still work for 16.2?

I didn't try to deploy against a 16.2 compose, but I'm assuming it should work because in that case the cephadm role is skipped (as the entire process happens during the regular overcloud deployment). It's worth trying this code against a 16.2 compose though, just to make sure we're not introducing particular issues.

fmount commented 2 years ago

This looks awesome. Does it still work for 16.2?

I didn't try to deploy against a 16.2 compose, but I'm assuming it should work because in that case the cephadm role is skipped (as the entire process happens during the regular overcloud deployment). It's worth trying this code against a 16.2 compose though, just to make sure we're not introducing particular issues.

Quick update here:

In a 16.2 env I tried I definitely see:

[stack@standalone ~]$ rpm -qa | grep ceph-ansible ceph-ansible-4.0.70.3-1.el8cp.noarch

[stack@standalone ~]$ cat standalone_parameters.yaml
...
...
...
  CephAnsibleDisksConfig:
    osd_scenario: lvm
    osd_objectstore: bluestore
    devices:
    - /dev/sdb
    - /dev/sdc
    - /dev/sdd
  CephAnsibleExtraConfig:
    cluster_network: 192.168.24.0/24
    public_network: 192.168.24.0/24
    ceph_nfs_bind_addr: "10.1.27.25"
    ceph_nfs_docker_extra_env: "--pids-limit=0"

  CephPoolDefaultPgNum: 32
  CephPoolDefaultSize: 1
...
...

and the overcloud deploy command looks like:

[stack@standalone ~]$ cat tripleo_deploy.sh | grep ceph-ansible
sudo openstack tripleo deploy  --templates $DEPLOY_TEMPLATES --standalone  --yes --output-dir $DEPLOY_OUTPUT_DIR  --stack $DEPLOY_STACK --standalone-role $DEPLOY_STANDALONE_ROLE --timeout $DEPLOY_TIMEOUT_ARG -e /usr/share/openstack-tripleo-heat-templates/environments/standalone/standalone-tripleo.yaml -e /home/stack/containers-prepare-parameters.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/external-network-vip.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-mds.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/manila-cephfsganesha-config.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml -e /home/stack/standalone_parameters.yaml -r $DEPLOY_ROLES_FILE -n $DEPLOY_NETWORKS_FILE     --deployment-user $DEPLOY_DEPLOYMENT_USER  --local-ip $DEPLOY_LOCAL_IP --control-virtual-ip $DEPLOY_CONTROL_VIP --public-virtual-ip $DEPLOY_PUBLIC_VIP    --keep-running     >/home/stack/standalone_deploy.log 2>&1

where ceph-ansible is included.

Ceph is Nautilus:

[root@standalone /]# ceph versions
{
    "mon": {
        "ceph version 14.2.11-199.el8cp (f5470cbfb5a4dac5925284cef1215f3e4e191a38) nautilus (stable)": 1
    },
    "mgr": {
        "ceph version 14.2.11-199.el8cp (f5470cbfb5a4dac5925284cef1215f3e4e191a38) nautilus (stable)": 1
    },
    "osd": {
        "ceph version 14.2.11-199.el8cp (f5470cbfb5a4dac5925284cef1215f3e4e191a38) nautilus (stable)": 3
    },
    "mds": {
        "ceph version 14.2.11-199.el8cp (f5470cbfb5a4dac5925284cef1215f3e4e191a38) nautilus (stable)": 1
    },
    "overall": {
        "ceph version 14.2.11-199.el8cp (f5470cbfb5a4dac5925284cef1215f3e4e191a38) nautilus (stable)": 6
    }
}

and I see a live execution running w/o issues [0], so it looks safe to me in terms of backwards compatibility.

[0] https://paste.opendev.org/show/bSkRuAMB4LJdrQwCiDQ1/

EmilienM commented 2 years ago

A few notes:

we'll need to add ContainerImageRegistryCredentials to containers-prepare-parameters.yaml somehow.
to deploy OSP17 we need to stop managing container-tools and virt modules.

vkmc commented 1 year ago

Redeploying OSP 17.0 with dev-install, I noticed this patch haven't merged yet. What's the status of this?