xdevxy commented 6 months ago

Summary

Extract out the current gitops watching logic out to be shared by both controller and agent. This also includes the logic to watching control repo.

Use Cases

A dedicated watcher for changes in repos

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

shubhamdixit863 commented 4 months ago

@xdevxy @juliev0 I reviewed the code in syncer.go and found that we have four scenarios:

Monitoring Changes: Detecting changes in the Git repository that need to be reflected in the Kubernetes cluster, utilizing the StartWatching, StopWatching, Contains, and Length functions.
Scheduling: Deciding when to check for updates and potentially queue them for processing, using the StartFunction.
Dispatching: Distributing tasks to worker processes or threads based on detected changes, using the run function.
Applying Changes: Actually applying the changes to the Kubernetes cluster, using the runOnce function.

What I think is that the Monitoring, Scheduling, and Dispatching logic form parts of the Watcher Logic, so I should decouple them and make them easily pluggable with other code. Please share your views on this.

juliev0 commented 4 months ago

Hey @shubhamdixit863. Good research.

I did want to clarify some things. I just realized this Issue mentions the Agent too. The Agent code already has a skeleton. For the Agent there is just a single Git definition defined in the ConfigMap which is evaluated (it could be templated) and then periodically fetches/syncs here. You probably can focus more on the Modular Watcher in the Controller than this, as this is something that will be done later. I think the main thing for this will be that we will ultimately need to have the ability to do cloning/fetching type code without needing to pass in a GitSync into the functions...so making some of those functions more generic perhaps.

That is what Agent has in common with the Modular Watcher for Controller. The Modular Watcher is basically a set of repository paths that need to be defined in the ConfigMap that we watch. (Or maybe for simplicity should it only be one @xdevxy ?) Also, it will I suppose only consist of GitSyncs, so theoretically it could only use raw type, if it makes things easier (or if you want to reuse the existing functionality maybe it's easier for it to follow the same model as our GitSync RepositoryPaths and allow all).

So, regarding your findings above, I actually think it could be a lot simpler than this, since we will either have just one path that we watch or a small handful. (Of course, we'd still need to account for ConfigMap changes.)

Let me know if you disagree with any of this @xdevxy. Thanks!

xdevxy commented 4 months ago

Overall modular watcher requires the following functionalities:

watch the specified repo path with a target revision, check there is any new changes.
if yes then apply the latest change.

These logic already exist in the syncer along with the git operations in git/util. We want to make it modular so the code can be shared/used by both watch control repo and the users repo (referred by gitsync).

On top of that, how the repo path with a target revision is specified are different.

For watching users repo, that information are defined in the GitSync spec.
while For watching the control repo, as what @juliev0 mentioned above, that information are defined in the ConfigMap. And we need to account for any ConfigMap changes.

numaproj-labs / numaplane

Modular Watcher #166

Summary

Use Cases