Open akrzos opened 5 years ago
FYI @chaitanyaenr
@jmencak @sjug @mffiedler @chaitanyaenr Hey guys can you provide some feedback on this approach to improving our tooling and potentially to run workloads (Example being the included configurable NodeVertical job that can run with/without pbench agents running and simply ran by ansible-playbook)
Testing right now. The first thing I've noticed is that openshift-install logs no longer go to OPENSHIFT_INSTALL_LOG_DIR
that the user defines. Could be unrelated to this PR though.
Looks good to me, nothing else to add that we didn't already discuss. Once the CL fixes merge that will fix the directory/path issues that you're working around in the respective shell script.
@jmencak About the install log, we are copying the log generated by the installer to the OPENSHIFT_INSTALL_LOG_DIR at the end of the install instead of tee'ing the stdout as it contains timestamps as well.
Thanks for preparing this! In general, I like the idea of centralizing most of the tooling into one repo. However, is the intention to keep all files necessary for running a specific workload in this repo and this repo only? Some workloads carry quite a few files and unless (1) the run.sh
script clones the whole test repo or a workload container image is used that already has the test repos baked in, this would require huge workload-<testname>-script-cm.yml.j2
files duplicating content of existing test repos. While this is certainly doable if we want to follow this path, the *.j2 files simply do not look very elegant to me, but the other approaches I can currently think of also have their disadvantages:
(1) limits the "run-on-any-cluster" due to external access to github (2) requires frequent rebuilding and probably tagging of the workload container image
Nits: to make this run, I had to:
4.1.0-0.nightly-2019-05-16-090009
default(fales, true)
-> default(false, true)
for enable_pbench_agents
.workload_job_privileged
to false
Thanks for preparing this! In general, I like the idea of centralizing most of the tooling into one repo.
Thanks Jiri!
However, is the intention to keep all files necessary for running a specific workload in this repo and this repo only?
The main intention is to reduce the burden to run a workload. Right now we all know not only does it require a cluster built by our install automation, but also the tests are spread with inter-dependencies among several repos. Also I think defining some clear boundaries on what belongs where will make it easier for all of us to run anyone else's workload. (An example would be, our workload container shouldn't be a catch all, but rather just the image that hosts the tools/binaries we need for a workload)
The process here to setup pbench and run nodevertical is greatly simplified and by using Ansible we can easily orchestrate this in Jenkins (In the same fashion as install jobs for scale-ci cluster) and/or run it from your local machine or pointed at a jump host / orchestration host. It provides great flexibility while remaining simple to run. Of course this is just one of the several workloads we have, ideally we can get all workloads into the same repo to reduce the repo sprawl that has occurred.
The other objective in this poc was to remove as many host-mounts from the workload/pbench pods. This decouples the workloads from our install process, in fact we can already eliminate the post-install-copy kubeconfig/copy ssh keys to nodes with this implementation since this implementation uses secrets to store the kubeconfig and ssh keys.
Some workloads carry quite a few files and unless (1) the
run.sh
script clones the whole test repo or a workload container image is used that already has the test repos baked in, this would require hugeworkload-<testname>-script-cm.yml.j2
files duplicating content of existing test repos. While this is certainly doable if we want to follow this path, the *.j2 files simply do not look very elegant to me, but the other approaches I can currently think of also have their disadvantages:
You bring up a good point, however I do believe the j2 file is far more elegant than cross repo inter-dependencies such as the current setup tooling job today. But we can do better than one large j2 file. I envision something more on the lines of each workload generally following the same concepts laid out here but not a super strict compliance with it. We could simply lay out only the items that require some configuration/template-ing in the j2 and place other that don't require it in a separate file with the correct extension.
(1) limits the "run-on-any-cluster" due to external access to github (2) requires frequent rebuilding and probably tagging of the workload container image
The workload container image is automatically built from a Dockerfile in this repo https://github.com/openshift-scale/images via quay - https://quay.io/repository/openshift-scale/scale-ci-workload
Nits: to make this run, I had to:
- use an old nightly
4.1.0-0.nightly-2019-05-16-090009
I have ran this yesterday using 4.1.0-0.nightly-2019-05-18-050636
, it might have been that the install automation here was still needing a fix on the machineset since the spec had changed, see #234
- change the
default(fales, true)
->default(false, true)
forenable_pbench_agents
.
Good catch.
- set
workload_job_privileged
tofalse
Again thanks for the feedback!
/cc @ekuric for more eyeballs, as we're likely to live with this in the future
For continuity, this work has been shifted over to this repo - https://github.com/openshift-scale/workloads to make it easier to git clone and run it.
POC Setup Tooling in a single repo
Only external dependency is a container image for controller/pbench agent.
DNM - Do not merge, just test/provide feedback. Thanks!