opendevstack / ods-core

The core of OpenDevStack - infrastructure setup based on Atlassian tools, Jenkins, Nexus, SonarQube and shared images
Apache License 2.0
45 stars 34 forks source link

ODS in a Box with Vagrant #936

Open felipecruz91 opened 3 years ago

felipecruz91 commented 3 years ago

The concept of ODS in a box it's a really great idea and I am learning more about it in my spare time. I am interested in creating a cloud-agnostic VM image (not coupled to AWS for instance) that can run on my laptop (not in the cloud) using Vagrant with VirtualBox.

I've forked ods-core (master branch) into my personal account and extended ODS in a box with the following files:

When I run packer build -on-error=ask ./ods-devenv/packer/CentOS2ODSBoxVagrant.json, Vagrant spins up a new VM in VirtualBox using the Centos 7 Base image and it proceeds to install the Atlassian stack, OpenShift, etc (using the boostrap.sh script). However, it fails when it tries to build the nexus image:

Screenshots

nexus-build-error

nexus-build-events

Affected version (please complete the following information):

Additional context From the error message, I understand that the user does not have the RBAC permissions needed to list the secrets under the ods namespace.

==> vagrant: Warning: Group 'system:authenticated' not found
    vagrant: role "view" added: "system:authenticated"
==> vagrant: + oc adm policy add-cluster-role-to-group system:image-puller system:authenticated -n ods
==> vagrant: Warning: Group 'system:authenticated' not found
    vagrant: cluster role "system:image-puller" added: "system:authenticated"
==> vagrant: + oc adm policy add-cluster-role-to-user self-provisioner system:serviceaccount:ods:jenkins
==> vagrant: Warning: ServiceAccount 'jenkins' not found
    vagrant: cluster role "self-provisioner" added: "system:serviceaccount:ods:jenkins"
felipecruz91 commented 3 years ago

After this failure, I have triggered the nexus build with oc start-build nexus -n ods and now it has completed successfully. I guess it is a timing issue where the secrets are not created by the time the build starts.

Perhaps it would be interesting to wait for all the resources to be created in OpenShift before starting the setup_nexus function.

clemensutschig commented 3 years ago

Do you have the err message from the failed run?

Felipe Cruz Martinez notifications@github.com schrieb am Di., 5. Jän. 2021, 11:48:

After this failure, I have triggered the nexus build with oc start-build nexus -n ods and now it has completed successfully. I guess it is a timing issue where the secrets are not created by the time the build starts.

Perhaps it would be interesting to wait for all the resources to be created in OpenShift before starting the setup_nexus https://github.com/felipecruz91/ods-core/blob/feature/ods-box/ods-devenv/scripts/deploy.sh#L1887 function.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opendevstack/ods-core/issues/936#issuecomment-754559986, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJV7CSCVCC6NYS6JWBSF4NLSYLVBHANCNFSM4VUZKLQA .

clemensutschig commented 3 years ago

Can you try this again with a wait/sleep before the startbuild?

Felipe Cruz Martinez notifications@github.com schrieb am Di., 5. Jän. 2021, 11:22:

The concept of ODS in a box it's a really great idea and I am learning more about it in my spare time. I am interested in creating a cloud-agnostic VM image (not coupled to AWS for instance) that can run on my laptop (not in the cloud) using Vagrant https://www.vagrantup.com/ with VirtualBox.

I've forked ods-core (master branch) into my personal account, made the pertinent changes, and run packer build -on-error=ask ./ods-devenv/packer/CentOS2ODSBoxVagrant.json

It installs the Atlassian stack, OpenShift, etc but it fails when it tries to build the nexus image:

Screenshots

[image: nexus-build-error] https://user-images.githubusercontent.com/15997951/103633129-99680180-4f45-11eb-9cc6-78a9971085a0.PNG

[image: nexus-build-events] https://user-images.githubusercontent.com/15997951/103633310-da601600-4f45-11eb-8514-de9dec66aa02.PNG

Affected version (please complete the following information):

  • OpenShift: 3.11
  • OpenDevStack (master branch)

Additional context From the error message, I understand that the user does not have the RBAC permissions needed to list the secrets under the ods namespace.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opendevstack/ods-core/issues/936, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJV7CSCTG4YPJ2NWEP5WVF3SYLR6DANCNFSM4VUZKLQA .

felipecruz91 commented 3 years ago

@clemensutschig It seems to be a timing issue, yes.

In this second run, it has progressed a lot and has started testing the ODS installation but it seems that the Prov App is not running:

    vagrant: === RUN   TestVerifyOdsProjectProvisionThruProvisionApi
    vagrant:     provision-api_test.go:40: Failed to delete project: Execution of `provisioning-app-api.sh` for 'DELETE/ODSVERIFY' failed:
    vagrant:         StdOut: No config file found, assuming defaults, current dir: /home/openshift/opendevstack/ods-core/tests/smoketest
    vagrant:
    vagrant:         Started provision project/component script with command (DELETE)!
    vagrant:
    vagrant:         ... encoding basic auth credentials in base64 format
    vagrant:
    vagrant:         ... sending request to 'https://prov-app-ods.ocp.odsbox.lan' (output will be saved in file './response.txt' and headers in file './headers.txt')

(omitted logs)

    vagrant:         < HTTP/1.0 503 Service Unavailable
    vagrant:         < Pragma: no-cache
    vagrant:         < Cache-Control: private, max-age=0, no-cache, no-store
    vagrant:         < Connection: close
    vagrant:         < Content-Type: text/html
    vagrant:         <
    vagrant:         { [data not shown]
    vagrant: 100  3265    0  3265    0     0   7805      0 --:--:-- --:--:-- --:--:--  7811
    vagrant:         * Closing connection 0
    vagrant:         Error from server (NotFound): namespaces "ODSVERIFY" not found
    vagrant:
    vagrant:         Err: exit status 1
    vagrant: --- FAIL: TestVerifyOdsProjectProvisionThruProvisionApi (2.52s)

After some investigation, it seems that the DeploymentConfig resource of the Prov App is not created by Tailor (only ImageStream and BuildConfig resources are).

function setup_provisioning_app() {
    echo "Setting up provisioning app"
    echo "make apply-provisioning-app-build:"
    pushd ods-provisioning-app/ocp-config

    tailor apply --namespace ${NAMESPACE} is,bc --non-interactive --verbose
    popd

    echo "make start-provisioning-app-build:"
    ocp-scripts/start-and-follow-build.sh --namespace ${NAMESPACE} --build-config ods-provisioning-app --verbose 
    ################ ^^^^^^^^ exits here after waiting for the build status to be complete  ################

 ################
  The lines below never get executed which means that the DeploymentConfig of the Prov App does not get created.
#################

    echo "make apply-provisioning-app-deploy:"
    pushd ods-provisioning-app/ocp-config
    tailor apply --namespace ${NAMESPACE} --exclude is,bc --non-interactive --verbose
    # roll back change to suppress confluence adapter
    git reset --hard
    popd
}
michaelsauter commented 3 years ago

@felipecruz91 Could be a resource issue ... https://gist.github.com/felipecruz91/1c124fe2fb24ec3ab341dc93b8256751#file-ods-box-vagrant-logs-txt-L126-L148 shows that the build is pending and never progresses ...

felipecruz91 commented 3 years ago

@michaelsauter Yes, the lack of resources makes the build take longer than expected 😅 but it ends up completing successfully after ~ 5 min. The problem is that the script bails out before the build completes and therefore the rest of resources such as the DeploymentConfig are not applied (just right after make apply-provisioning-app-deploy)

clemensutschig commented 3 years ago

All of the startbuild/rollout habe --follow.. you could try to set/increase the timeout

Felipe Cruz Martinez notifications@github.com schrieb am Di., 5. Jän. 2021, 17:00:

@michaelsauter https://github.com/michaelsauter Yes, the lack of resources makes the build take longer than expected 😅 but it ends up completing successfully after ~ 5 min. The problem is that the script bails out before completing and the rest of resources such as the DeploymentConfig are not applied (just right after make apply-provisioning-app-deploy)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opendevstack/ods-core/issues/936#issuecomment-754726470, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJV7CSBPROCR56YBST3BAO3SYMZQDANCNFSM4VUZKLQA .

felipecruz91 commented 3 years ago

Increasing the timeout solves this particular issue. As a suggestion for the future, it would be nice to monitor the build status and make the script wait until the build has successfully completed (instead of waiting a fixed time and hope the build will complete within the timeout period).

clemensutschig commented 3 years ago

Felipe - feel tree to Change an Open a PR ;-)

Felipe Cruz Martinez notifications@github.com schrieb am Do., 7. Jän. 2021, 11:18:

Increasing the timeout solves this particular issue. As a suggestion for the future, it would be nice to monitor the build status and make the script wait until the build has successfully completed (instead of waiting a fixed time and hope the build will complete within the timeout period).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opendevstack/ods-core/issues/936#issuecomment-756024384, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJV7CSHWREPYWQZRU7N7UGDSYWDATANCNFSM4VUZKLQA .