Closed shawn-hurley closed 6 years ago
Do you have this image built and staged anywhere? If you'd be willing to do that with some tag for runner, I can rebuild my image and test my operator built off the ansible-operator against this to give you some feedback.
Generally speaking this is working for me when running my setup (10+ roles, 15+ services) with the ansible-runner based operator. I am seeing that the Ansible configuration option stdout_callback = actionable
does not seem to be respected but that may be an issue with ansible-runner. I am going to try to independently verify that but would not hold this change up on account of it. I do also see status information now.
Is there way to configure custom status? Is that planned?
I dug more into ansible-runner and it uses a custom stdout_callback to emit the event data as files. So I played around a bit how to mimic the behavior I wanted. Here's a diff and sample output to kinda show my thoughts on reducing noise so that the operator gives better feedback when using ansible-runner:
Diff
diff --git a/pkg/runner/runner.go b/pkg/runner/runner.go
index db0a2fd..6d5aaa8 100644
--- a/pkg/runner/runner.go
+++ b/pkg/runner/runner.go
@@ -36,6 +36,18 @@ func (e EventTime) MarshalJSON() ([]byte, error) {
return []byte(fmt.Sprintf("\"%s\"", e.Time.Format("2006-01-02T15:04:05.99999999"))), nil
}
+type StdOut struct {
+ Changed bool `json:"changed"`
+ Error int `json:"error"`
+ Item string `json:"item"`
+ Msg string `json:"msg"`
+}
+
+type EventData struct {
+ Task string `json:"task"`
+ Role string `json:"role"`
+}
+
// JobEvent - event of an ansible run.
type JobEvent struct {
UUID string `json:"uuid"`
@@ -44,7 +56,7 @@ type JobEvent struct {
StartLine int `json:"start_line"`
EndLine int `json:"EndLine"`
Event string `json:"event"`
- EventData map[string]interface{} `json:"event_data"`
+ EventData EventData `json:"event_data"`
PID int `json:"pid"`
Created EventTime `json:"created"`
}
@@ -106,8 +118,6 @@ func (p *Playbook) Run(parameters map[string]interface{}, name, namespace string
logrus.Infof("running: %v for playbook: %v", ident, p.Path)
dc := exec.Command("ansible-runner", "-vv", "-p", "playbook.yaml", "-i", fmt.Sprintf("%v", ident), "run", runnerSandbox)
- dc.Stdout = os.Stdout
- dc.Stderr = os.Stderr
err = dc.Run()
if err != nil {
return nil, err
@@ -123,6 +133,28 @@ func (p *Playbook) Run(parameters map[string]interface{}, name, namespace string
return nil, fmt.Errorf("Unable to read event data")
}
sort.Sort(fileInfos(eventFiles))
+
+ for i := 0; i < len(eventFiles); i++ {
+ file, _ := ioutil.ReadFile(fmt.Sprintf("%v/artifacts/%v/job_events/%v", runnerSandbox, ident, eventFiles[i].Name()))
+ jobEvent := JobEvent{}
+
+ err = json.Unmarshal(file, &jobEvent)
+
+ if jobEvent.Event == "runner_item_on_changed" || jobEvent.Event == "runner_item_on_failed" {
+ split := strings.Split(jobEvent.StdOut, "=> ")
+ stdoutJson := split[1]
+
+ stdOut := StdOut{}
+ err = json.Unmarshal([]byte(stdoutJson), &stdOut)
+ logrus.WithFields(logrus.Fields{
+ "changed": stdOut.Changed,
+ "error": stdOut.Error,
+ "task": jobEvent.EventData.Task,
+ "role": jobEvent.EventData.Role,
+ }).Info(stdOut.Msg)
+ }
+ }
+
//get the last event, which should be a status.
d, err := ioutil.ReadFile(fmt.Sprintf("%v/artifacts/%v/job_events/%v", runnerSandbox, ident, eventFiles[len(eventFiles)-1].Name()))
if err != nil {
Output:
time="2018-07-19T16:44:34Z" level=info msg="Failed to retrieve requested object: {\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"routes \\\"foreman-https\\\" is forbidden: User \\\"system:serviceaccount:foreman:foreman-operator\\\" cannot get routes in the namespace \\\"foreman\\\": User \\\"system:serviceaccount:foreman:foreman-operator\\\" cannot get routes in project \\\"foreman\\\"\",\"reason\":\"Forbidden\",\"details\":{\"name\":\"foreman-https\",\"kind\":\"routes\"},\"code\":403}\n" changed=false error=403 role=foreman-routes task="foreman routes"
time="2018-07-19T16:44:34Z" level=info msg="Failed to retrieve requested object: {\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"routes \\\"foreman-http-pub\\\" is forbidden: User \\\"system:serviceaccount:foreman:foreman-operator\\\" cannot get routes in the namespace \\\"foreman\\\": User \\\"system:serviceaccount:foreman:foreman-operator\\\" cannot get routes in project \\\"foreman\\\"\",\"reason\":\"Forbidden\",\"details\":{\"name\":\"foreman-http-pub\",\"kind\":\"routes\"},\"code\":403}\n" changed=false error=403 role=foreman-routes task="foreman routes"
time="2018-07-19T16:44:34Z" level=info msg="Failed to retrieve requested object: {\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"routes \\\"foreman-http-pulp\\\" is forbidden: User \\\"system:serviceaccount:foreman:foreman-operator\\\" cannot get routes in the namespace \\\"foreman\\\": User \\\"system:serviceaccount:foreman:foreman-operator\\\" cannot get routes in project \\\"foreman\\\"\",\"reason\":\"Forbidden\",\"details\":{\"name\":\"foreman-http-pulp\",\"kind\":\"routes\"},\"code\":403}\n" changed=false error=403 role=foreman-routes task="foreman routes"
One additional thought with ansible-runner, since it creates data on the system and thus could grow over time as the operator continues to run, have you given thought to rotating job_events off the file system?
@ehelms We have not, but now we will 😄
I will be adding this to our documentation and feature tracker so we don't lose this :)
@ehelms re: your diff I like the concept. I was thinking of something that would not wait for the entire job to complete, but instead in a go thread watch for new events to be added and then read them, log them there, and then publish a k8s event for the CR with the status of that event. This is just my personal initial thought though and we will be working on a more concreate design/road map where will probably answer those questions. Does that sound like a good idea to you though?
@shawn-hurley I think that sounds solid. My only request would be to make it configurable whether to stream all events or only change and failure events to the logger for easier debugging of roles.