Method for validating a template's dependency requests

mkam commented 2 years ago

This is to fix an issue with the CTS create task API where the request hangs when a call to Consul fails.

If, for example, a task created via the API specifies a datacenter that does not exist, the controller's main Run method detects the error through the watcher's error channel and logs this error. However, the createTask method used by the API does not know that this error has occurred and gets stuck in a loop waiting on the RenderTemplate to complete, which it never will complete.

My solution for this is to have a synchronous validation method that can call the Fetch for all dependencies and return any errors. This method would be called after the template has been registered, and if there is an error, we'd deregister the template and return the error.

This currently is still a work in progress because it's only working if the check for whether the template isDirty() is removed, which I haven't determined the reasoning for.

eikenb commented 2 years ago

Quick question. In the code you linked to, does the rw.watcher.Watch() call return the error you need and the logging you are referring to is the logging in that go routine?

mkam commented 2 years ago

Yeah, you're correct, the error is returned by rw.watcher.Watch() and I was referring to that logging in the go routine (error monitoring template dependencies).

eikenb commented 2 years ago

Note that related to the isDirty() check.. if you call t.Notify(nil) before t.Execute(recaller) it will run without your having to comment that out.

mkam commented 2 years ago

t.Notify(nil) before t.Execute(recaller) didn't work for me, but I tried calling it after the execute instead and that fixed my issue! Sorry, I now realize I should have been clearer about what I meant by not working!

What I was seeing before adding the Notify or before commenting out the isDirty check was that in the happy path where the validate call succeeded, the watcher was indicating that there were zero dependencies (as returned by watcher.Size()). CTS would then hang since no dependencies were being watched.

2022-01-31T10:31:42.800-0600 [INFO]  ctrl: drivers initialized
2022-01-31T10:31:42.800-0600 [INFO]  ctrl: executing all tasks once through
2022-01-31T10:31:42.800-0600 [TRACE] driver.terraform: checking dependency changes for task: task_name=test-task
2022-01-31T10:31:42.800-0600 [DEBUG] ctrl: watching dependencies: dependency_size=0

But with notifying after executing, everything is working as expected! I'll be doing some cleanup on this branch and then change this PR to ready for review. Thanks for helping me out @eikenb!

eikenb commented 2 years ago

Oh... yeah. I didn't get the issue. Glad you figured it out!

The whole 'dirty' thing comes from filesystems and how they mark a segment of memory as needing to be sync'd to disk, aka. dirty. It is meant to be a marker that is cleared when done leaving that memory (Notifier in this case) alone until it has been marked again. Sorry if I'm over-explaining and you know this already, just thought I'd show my inspiration/thinking.

mkam commented 2 years ago

Rebased, added tests, and updated all the query functions (except a few test ones) to return any recaller errors.

eikenb commented 2 years ago

Playing around with options I think I was able to implement this validate logic externally only assuming access to the clients. Using the scoping to capture the errors. Something like...

func ValidateTemplate(t *hcat.Template, clients hcat.Looker) error {
    var outer_err error
    recaller := func(dep dep.Dependency) (interface{}, bool) {
        data, _, err := dep.Fetch(clients)
        if err != nil {
            outer_err = err
            return nil, false
        }
        return data, true
    }
    t.Execute(recaller)
    t.Notify(nil)
    return outer_err
}

Do you see any reason this wouldn't work?

mkam commented 2 years ago

Tried it out and the only thing I needed is to add a method on the Watcher to return its clients! I keep forgetting about closures, that's such a nicer way to get the error than changing the Recaller signature. This also has the bonus of being able to return the error from the fetch without it wrapped in information about the template. I was thinking how template ID's and template related error wording was too detailed for our API user.

I'm happy to close this out in favor of moving the validation to CTS, but what to get your take on whether we'd benefit from having this bit of validation logic in hcat. It may be a helpful pattern to make available. Let me know what you think!

eikenb commented 2 years ago

Great! Glad you liked my idea.

My initial take is to have you re-implement it in CTS for now. It does seem like it could be useful, but I'm not sure if it would be useful for anyone else and it's a lot easier to add something later than to take something away.

I am planning on adding it as an example/test in the doc_test.go file though. That should help convey the idea and make sure the pattern stays working.

For client access, I'm just going to make the clients field on the watcher object public (watcher.clents -> watcher.Clients).

hashicorp / hcat

Method for validating a template's dependency requests #95