red-hat-storage / cephci

CEPH-CI is a framework for testing Red Hat Ceph Storage product builds
MIT License
23 stars 64 forks source link

EC 8+6 MSR rule test workflow changes #4034

Closed pdhiran closed 3 days ago

pdhiran commented 2 weeks ago

Summary of Changes in the PR

  1. Improved pool clean-up for EC pools, now deleting the CRUSH Rule & the EC Profile created for the EC pools during clean-up.
  2. RBD init has been moved down in pool creation, this is because overwrites need to be enabled on EC pools, before we run init in case of RBD.
  3. Added new module for deletion of EC profile, and printing the CRUSH rule used in creation of EC pools.
  4. Added new "--yes-i-mean-it" flag support for modification of EC pools, introduced in 8.0.
  5. Updated few commands to be run on Client Nodes.
  6. Printing Ceph Status along with health detail in the finally block.
  7. Created New conf file, For running 4 Node EC pool tests in Openstack env in Squid & Reef. conf/reef/rados/4-node-ec-cluster-1-client.yaml & conf/squid/rados/4-node-ec-cluster-1-client.yaml
  8. Created New conf file, For running 3 AZ tests in Openstack env. conf/squid/rados/3AZ-cluster.yaml
  9. Added new Test case in all the releases, testing the EC profile behaviour. Polarion ID : CEPH-83596295 . Test code in pool_tests.py module
  10. Updated the deployment suite for 4 Node EC pool clusters, to facilitate node removals & additions in future via CI.
  11. Deployment Suite file for 3 AZ clusters with mon locations. suites/squid/rados/tier-3_rados_test-3-AZ-Cluster.yaml
  12. Modified method get_device_path(), to accommodate the host running containers without name or command. This was a issue in 4 node clusters, where containers like grafana, prometheus & alertmanager run colocated with OSD hosts.
  13. Automated 3 bugzilla & tracker scenarios in Online reads balancer module, to check if the read suggestions are valid, are added in osdmap & to confirm the read suggestions are enforced on the clusters.
  14. Updated the log line to be identified during scrub/ deep scrub scenarios.
neha-gangadhar commented 2 weeks ago

@pdhiran Please add description, fix tox issues and attach logs if any run for feature testing is fine as well.

pdhiran commented 1 week ago

pass log: 3az cluster deployment : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-EPH160 Balancer Changes : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-8CKK0Z EC profile tests: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-XQ336H

EC 8+6 Pass logs : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-QSJ4QN/

Failures details :

  1. [EC Pool Recovery Improvement] -> Active bug : https://bugzilla.redhat.com/show_bug.cgi?id=2305520

  2. [ceph-bluestore-tool utility] -> Active bug : https://bugzilla.redhat.com/show_bug.cgi?id=2310344

  3. [EC pool with Overwrites] -> Active bug : https://bugzilla.redhat.com/show_bug.cgi?id=2305966

  4. [Inconsistent objects in EC pool functionality check] -> Socket timeout hit, Rerunning the tests. Pass log for [Inconsistent objects in EC pool functionality check] http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-HHATX6/Inconsistent_objects_in_EC_pool_functionality_check_0.log

  5. [Verify scrub logs] -> Intermittent failure, No journalctl logs found. Rerunning the tests. Pass log : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-GHRXR3/Verify_scrub_logs_0.log

pdhiran commented 6 days ago

@pdhiran Please add description, fix tox issues and attach logs if any run for feature testing is fine as well.

Fixed the issues and added summary of all the changes made in the PR.

SrinivasaBharath commented 5 days ago

@pdhiran , Please provide the bugzilla and tracker details for the following task.

Automated 3 bugzilla & tracker scenarios in Online reads balancer module, to check if the read suggestions are valid, are added in osdmap & to confirm the read suggestions are enforced on the clusters.

pdhiran commented 3 days ago

@pdhiran , Please provide the bugzilla and tracker details for the following task.

Automated 3 bugzilla & tracker scenarios in Online reads balancer module, to check if the read suggestions are valid, are added in osdmap & to confirm the read suggestions are enforced on the clusters.

Added the bugzilla details.

openshift-ci[bot] commented 3 days ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: neha-gangadhar, pdhiran

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/red-hat-storage/cephci/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment