red-hat-storage / ocs-ci

https://ocs-ci.readthedocs.io/en/latest/
MIT License
109 stars 166 forks source link

ocs-ci performance test fails to find image #4928

Closed gitsridhar closed 2 years ago

gitsridhar commented 2 years ago

Tried to execute tests/e2e/performance/test_fio_benchmark.py::TestFIOBenchmark::test_fio_workload_simple[CephBlockPool-sequential]

and it failed to find elastic-operator-0 pod:

oc describe pod elastic-operator-0 -n elastic-system

Events: Type Reason Age From Message


Normal Scheduled 2m16s default-scheduler Successfully assigned elastic-system/elastic-operator-0 to rdr-sri-26f5-syd04-worker-2 Normal AddedInterface 2m15s multus Add eth0 [10.129.2.28/23] from openshift-sdn Normal Pulling 45s (x4 over 2m14s) kubelet Pulling image "docker.elastic.co/eck/eck-operator:1.7.1" Warning Failed 42s (x4 over 2m12s) kubelet Failed to pull image "docker.elastic.co/eck/eck-operator:1.7.1": rpc error: code = Unknown desc = choosing image instance: no image found in manifest list for architecture ppc64le, variant "", OS linux Warning Failed 42s (x4 over 2m12s) kubelet Error: ErrImagePull Warning Failed 16s (x6 over 2m12s) kubelet Error: ImagePullBackOff Normal BackOff 3s (x7 over 2m12s) kubelet Back-off pulling image "docker.elastic.co/eck/eck-operator:1.7.1"

This is happening from ppc64le environment. Is there a workaround?

Avilir commented 2 years ago

Hi,

There are 2 Performance test : tests/e2e/performance/test_fio_benchmark.py and tests/e2e/performance/test_small_file_workload.py Which require an elasticsearch server available to the test pods running in the cluster.

This elasticsearch server is deployed on the cluster by the test - working only on x86_64 architecture. For other architecture like ppc64el and s390x (IBM-Z) you need to have an elasticsearch server outside the cluster and use this section in the configuration file :

PERF:
  deploy_internal_es: **false**
  internal_es_server: “x.x.x.x”   # <-  **change this IP**
  internal_es_port: 9200  # change if you are not using the default
  production_es: true
  production_es_server: “x.x.x.x”   # <-  **change this IP**
  production_es_port: 9200  # change if you are not using the default
gitsridhar commented 2 years ago

ocs-ci conf section for PERF looks like this: PERF: production_es: true deploy_internal_es: false internal_es_server: 172.30.64.155 production_es_server: 172.30.64.155

we have implemented elastic search on 172.30.64.155. But the testcase /root/venv/bin/python3.8 /root/venv/bin/run-ci -m performance --cluster-name ocstest --ocp-version 4.9 --ocs-version=4.9 --ocsci-conf conf/ocsci/production_powervs_upi.yaml --ocsci-conf conf/ocsci/lso_enable_rotational_disks.yaml --ocsci-conf conf/ocsci/manual_subscription_plan_approval.yaml --ocsci-conf conf/examples/monitoring.yaml --ocsci-conf /root/ocs-ci-conf.yaml --cluster-path /root tests/e2e/performance/test_pvc_creation_deletion_performance.py::TestPVCCreationDeletionPerformance::test_pvc_creation_deletion_measurement_performance[CephBlockPoolThick-15Gi]

is trying to push data to a wrong IP addressed elastic search server: 15:59:05 - MainThread - ocs_ci.utility.performance_dashboard - INFO - Trying to push {'json': '[{"commitid": "4.9.0-0.nightly-ppc64le-2021-09-09-145007", "project": "OCS4.9", "branch": "stable-4.9", "executable": "4.9", "benchmark": "cephfs-pvc-1-pvc-creation-time", "environment": "POWERVS ", "result_value": 181.3977002}]'} to codespeed server: http://10.0.78.167:8000/result/add/json/

and it fails. I see 10.0.78.167 mentioned in ./ocs_ci/framework/conf/default_config.yaml:227: production_es_server: "10.0.78.167" ./ocs_ci/ocs/constants.py:188:CODESPEED_URL = "http://10.0.78.167:8000/"

Avilir commented 2 years ago

This is not elasticsearch issue, it is codespeed issue, and we need to remove the codespeed since we are not using it anymore.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 30 days if no further activity occurs.

github-actions[bot] commented 2 years ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.