water-hole / ansible-operator

POC Code for the operator backed by ansible
48 stars 29 forks source link

Ansible runner integration POC #2

Closed shawn-hurley closed 6 years ago

shawn-hurley commented 6 years ago
ehelms commented 6 years ago

Do you have this image built and staged anywhere? If you'd be willing to do that with some tag for runner, I can rebuild my image and test my operator built off the ansible-operator against this to give you some feedback.

ehelms commented 6 years ago

Generally speaking this is working for me when running my setup (10+ roles, 15+ services) with the ansible-runner based operator. I am seeing that the Ansible configuration option stdout_callback = actionable does not seem to be respected but that may be an issue with ansible-runner. I am going to try to independently verify that but would not hold this change up on account of it. I do also see status information now.

Is there way to configure custom status? Is that planned?

ehelms commented 6 years ago

I dug more into ansible-runner and it uses a custom stdout_callback to emit the event data as files. So I played around a bit how to mimic the behavior I wanted. Here's a diff and sample output to kinda show my thoughts on reducing noise so that the operator gives better feedback when using ansible-runner:

Diff

diff --git a/pkg/runner/runner.go b/pkg/runner/runner.go
index db0a2fd..6d5aaa8 100644
--- a/pkg/runner/runner.go
+++ b/pkg/runner/runner.go
@@ -36,6 +36,18 @@ func (e EventTime) MarshalJSON() ([]byte, error) {
        return []byte(fmt.Sprintf("\"%s\"", e.Time.Format("2006-01-02T15:04:05.99999999"))), nil
 }

+type StdOut struct {
+       Changed bool    `json:"changed"`
+       Error           int                     `json:"error"`
+       Item            string  `json:"item"`
+       Msg                     string  `json:"msg"`
+}
+
+type EventData struct {
+       Task string     `json:"task"`
+       Role string     `json:"role"`
+}
+
 // JobEvent - event of an ansible run.
 type JobEvent struct {
        UUID      string                 `json:"uuid"`
@@ -44,7 +56,7 @@ type JobEvent struct {
        StartLine int                    `json:"start_line"`
        EndLine   int                    `json:"EndLine"`
        Event     string                 `json:"event"`
-       EventData map[string]interface{} `json:"event_data"`
+       EventData EventData                                              `json:"event_data"`
        PID       int                    `json:"pid"`
        Created   EventTime              `json:"created"`
 }
@@ -106,8 +118,6 @@ func (p *Playbook) Run(parameters map[string]interface{}, name, namespace string
        logrus.Infof("running: %v for playbook: %v", ident, p.Path)

        dc := exec.Command("ansible-runner", "-vv", "-p", "playbook.yaml", "-i", fmt.Sprintf("%v", ident), "run", runnerSandbox)
-       dc.Stdout = os.Stdout
-       dc.Stderr = os.Stderr
        err = dc.Run()
        if err != nil {
                return nil, err
@@ -123,6 +133,28 @@ func (p *Playbook) Run(parameters map[string]interface{}, name, namespace string
                return nil, fmt.Errorf("Unable to read event data")
        }
        sort.Sort(fileInfos(eventFiles))
+
+       for i := 0; i < len(eventFiles); i++ {
+               file, _ := ioutil.ReadFile(fmt.Sprintf("%v/artifacts/%v/job_events/%v", runnerSandbox, ident, eventFiles[i].Name()))
+               jobEvent := JobEvent{}
+
+               err = json.Unmarshal(file, &jobEvent)
+
+               if jobEvent.Event == "runner_item_on_changed" || jobEvent.Event == "runner_item_on_failed" {
+                       split := strings.Split(jobEvent.StdOut, "=> ")
+                       stdoutJson := split[1]
+
+                       stdOut := StdOut{}
+                       err = json.Unmarshal([]byte(stdoutJson), &stdOut)
+                       logrus.WithFields(logrus.Fields{
+                               "changed": stdOut.Changed,
+                               "error": stdOut.Error,
+                               "task": jobEvent.EventData.Task,
+                               "role": jobEvent.EventData.Role,
+                       }).Info(stdOut.Msg)
+               }
+       }
+
        //get the last event, which should be a status.
        d, err := ioutil.ReadFile(fmt.Sprintf("%v/artifacts/%v/job_events/%v", runnerSandbox, ident, eventFiles[len(eventFiles)-1].Name()))
        if err != nil {

Output:

time="2018-07-19T16:44:34Z" level=info msg="Failed to retrieve requested object: {\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"routes \\\"foreman-https\\\" is forbidden: User \\\"system:serviceaccount:foreman:foreman-operator\\\" cannot get routes in the namespace \\\"foreman\\\": User \\\"system:serviceaccount:foreman:foreman-operator\\\" cannot get routes in project \\\"foreman\\\"\",\"reason\":\"Forbidden\",\"details\":{\"name\":\"foreman-https\",\"kind\":\"routes\"},\"code\":403}\n" changed=false error=403 role=foreman-routes task="foreman routes"
time="2018-07-19T16:44:34Z" level=info msg="Failed to retrieve requested object: {\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"routes \\\"foreman-http-pub\\\" is forbidden: User \\\"system:serviceaccount:foreman:foreman-operator\\\" cannot get routes in the namespace \\\"foreman\\\": User \\\"system:serviceaccount:foreman:foreman-operator\\\" cannot get routes in project \\\"foreman\\\"\",\"reason\":\"Forbidden\",\"details\":{\"name\":\"foreman-http-pub\",\"kind\":\"routes\"},\"code\":403}\n" changed=false error=403 role=foreman-routes task="foreman routes"
time="2018-07-19T16:44:34Z" level=info msg="Failed to retrieve requested object: {\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"routes \\\"foreman-http-pulp\\\" is forbidden: User \\\"system:serviceaccount:foreman:foreman-operator\\\" cannot get routes in the namespace \\\"foreman\\\": User \\\"system:serviceaccount:foreman:foreman-operator\\\" cannot get routes in project \\\"foreman\\\"\",\"reason\":\"Forbidden\",\"details\":{\"name\":\"foreman-http-pulp\",\"kind\":\"routes\"},\"code\":403}\n" changed=false error=403 role=foreman-routes task="foreman routes"
ehelms commented 6 years ago

One additional thought with ansible-runner, since it creates data on the system and thus could grow over time as the operator continues to run, have you given thought to rotating job_events off the file system?

shawn-hurley commented 6 years ago

@ehelms We have not, but now we will 😄

I will be adding this to our documentation and feature tracker so we don't lose this :)

shawn-hurley commented 6 years ago

@ehelms re: your diff I like the concept. I was thinking of something that would not wait for the entire job to complete, but instead in a go thread watch for new events to be added and then read them, log them there, and then publish a k8s event for the CR with the status of that event. This is just my personal initial thought though and we will be working on a more concreate design/road map where will probably answer those questions. Does that sound like a good idea to you though?

ehelms commented 6 years ago

@shawn-hurley I think that sounds solid. My only request would be to make it configurable whether to stream all events or only change and failure events to the logger for easier debugging of roles.