overhangio / tutor

The Docker-based Open edX distribution designed for peace of mind
https://docs.tutor.overhang.io/
GNU Affero General Public License v3.0
916 stars 435 forks source link

`patchStrategicMerge` doesn't work with jobs #791

Closed keithgg closed 1 year ago

keithgg commented 1 year ago

Hi @regisb

I couldn't think of a decent way to fix this, so I'm checking if you have given it any thought.

When creating a patchStrategicMerge for jobs (specifically the forum job) I get the following error:

error: no matches for Id Deployment.v1.apps/forum.[noNs]; failed to find unique target for patch Deployment.v1.apps/forum.[noNs]

This is because tutor regenerates the jobs.yml file for every run.

You can find an example at our Grove plugin. This is just for illustration, what I'm actually trying to accomplish is add a volumeMount to the pod.

To reproduce the error:

pip install --upgrade tutor[full]
tutor config save
pip install --upgrade git+https://gitlab.com/opencraft/dev/tutor-contrib-grove@keith/tutor-override-test
tutor plugins enable grove
tutor config save
tutor k8s launch
regisb commented 1 year ago

The fact that plugin developers cannot modify jobs is an issue.

I believe that root of the problem is that jobs are loaded directly in Python, with load_job(name), thus bypassing kubectl's strategic merge.

We could resolve this issue by loading the job definitions from the output of kubectl kustomize <env>. Something like:

resources = check_output("kubectl", "kustomize", context.root)
for resource in yaml.safe_load_all(resources):
    if resource["metadata"]["name"] == name:
        return resource

Do you think this would be a solution for you? If yes, would you like to open a PR or should I do it?

That being said, I am fully aware that the way Tutor jobs are handled in k8s is awful. I wish we had a better solution. My problem is that I do not know what is the "right" way to run init tasks in k8s. Do you have an opinion @keithgg?

keithgg commented 1 year ago

Thanks @regisb. I don't have any sense of what the "right" way here is either, but I'm happy to implement your suggestion and look into alternative solutions.

I'll schedule a task in my next sprint to do the PR.