redhat-openstack / tripleo-quickstart

Ansible roles for setting up TripleO virtual environments and building images
16 stars 15 forks source link

Do gate jobs need to perform an overcloud deploy? #31

Closed larsks closed 8 years ago

larsks commented 8 years ago

The gate jobs take a looooong time to complete. Do they actually need to perform a full overcloud deploy? Can we have separate short-running jobs that only get as far as booting the undercloud node so that we can get faster feedback on changes (while still testing out the full overcloud deploy)?

trown commented 8 years ago

This is tricky. It is possible that not all changes require a full deploy. However, changing something in the libvirt roles can break deploy. For instance if we change the memory for the fake baremetals, that would affect deploy.

The tricky part then, is how to decide if a change could possibly break deploy. Right now we use regex in the gerrit trigger to decide whether we need to run the full image build and deploy job, but that is pretty clear. If we change the images role, or the build-images playbook then we run that job.

Do you have some ideas for parts of the tree that are safe to run only the undercloud? I can mostly only think of the reverse, which could also be the basis of the regex for this.

larsks commented 8 years ago

This is tricky. It is possible that not all changes require a full deploy. However, changing something in the libvirt roles can break deploy.

This is why I was suggesting having two jobs: one that does not run a full deploy and completes quickly, so that if you've introduced something that breaks the early stages you know quickly, while having a second job that still runs the full deploy.

Do you have some ideas for parts of the tree that are safe to run only the undercloud?

I don't think there's a way to declare anything "safe" in terms of deciding whether to trigger a full deploy or not. I think we would need to run both sets of tests before accepting a patch.

I was mostly thinking of ways to provide faster negative feedback.

apevec commented 8 years ago

one that does not run a full deploy and completes quickly... while having a second job that still runs the full deploy.

Could those two be configured in a pipeline like rdo-promote, so that 2nd one runs only one 1st one is successful?

trown commented 8 years ago

I like the idea of having a job that tests the new user experience. I am not totally sure how to test quickstart.sh directly as referenced in https://github.com/redhat-openstack/tripleo-quickstart/issues/32 However, if we use quickstart.yml in CI we would get a pretty close approximation. We would then want a job using the default tags in quickstart.sh, as well as all tags. The default tags job would fit this issue. The all tags job would be equivalent to what we have now.

We could also test cleanup in the default tags job since it would be pretty quick running.

I am not sure we would get different feedback from the two jobs other than cleanup. If the new default tags job failed before deploy, I would think the all tags job would fail in the same place in the same amount of time. What scenario am I missing?

One thing we could do is only have the default tags job be on the gerrit-trigger. Then when a core reviewer +2's a patch, they comment "full-deploy check" or the like, and it will run the full CI. This would have the benefit of not needing to figure out how to auto-kill jobs for a previous patchset when a new patchset is uploaded. Since we would only be running the hour long job intentionally when we are otherwise ready to merge.

trown commented 8 years ago

https://github.com/redhat-openstack/tripleo-quickstart/issues/30 is also related to the above comment.

larsks commented 8 years ago

I am not totally sure how to test quickstart.sh directly as referenced in #32

Maybe https://review.gerrithub.io/#/c/265907/ would help with that.

trown commented 8 years ago

Indeed, that was one of the issues. The other is where to run quickstart.sh from. If we run it directly from the jenkins slave with the working-dir option set to the jenkins provided workspace it should be safe, but we would need to be careful we don't cross-contaminate between the jobs.

Alternatively, we could run it on the provided baremetal host, but then that requires writing some tiny playbook to download and run it there. This would also be tricky to get logs, since we are still using log collection from khaleesi which would need the hosts/ssh.config.ansible to work.

larsks commented 8 years ago

One thing we could do is only have the default tags job be on the gerrit-trigger. Then when a core reviewer +2's a patch, they comment "full-deploy check" or the like,

I wonder if we could just trigger on the +2, and not need an explicit comment?

trown commented 8 years ago

I wonder if we could just trigger on the +2, and not need an explicit comment?

I checked and there is a gerrit trigger option for this.

trown commented 8 years ago

resolved by https://github.com/redhat-openstack/tripleo-quickstart/commit/389d2e9af6eb3fa00fc5096246de33b71279e3a5