6543 commented 1 year ago

basic support (#9) was added with #552

Current state:

[x] Basic pipeline support
[x] Proper service support
[x] Default resource limits: 512MB, 1 core
[x] Integrate it to drone-server
[x] RWX volume support with storageclasses
[x] Deploy it and run builds for 10 repos with it for 2 weeks
[x] support resource limits for pods (#1767)
[x] make services work
- [x] support service ports (#2656)

maltegrosse commented 1 year ago

@6543 great work, can I test it somehow? Is it possible to include any resources into it, e.g. nvidia gpus ?

6543 commented 1 year ago

hmm @maltegrosse at the moment I would say, best help is to test and point out it's limitations/issues.

passthrough hardware axeleration like gpus, I never thought of. if it's about help via $, we have an openCollective account

maltegrosse commented 1 year ago

@6543 is there a different behavior regarding cpu/mem resources and other resources available on the node?
https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

eg:

https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/utils/gpu/gpu.go#L113

vs.

https://github.com/woodpecker-ci/woodpecker/blob/3b0263442a23bfa7906c54601adb39af8463c2b0/pipeline/backend/kubernetes/pod.go#L74

6543 commented 1 year ago

I guess gpu's are just not jet taken into account - but that's an interesting usecase

Dan6erbond commented 1 year ago

@maltegrosse as long as the cluster has device plugins providing resources such as GPUs, it's pretty much the same as defining a resource limit for CPU/memory.

maltegrosse commented 11 months ago

@6543 I seeing great progress in k8s backend support. I think I wanna give it a try and update from 0.x to the latest stable version. But now I just saw that 2.0 is in the pipeline. Would you recommend me to wait for the new 2.x major release? if yes, is there any ETA ?

qwerty287 commented 11 months ago

You're right that 2.0 is in progress, but I don't think it's a bad idea to upgrade to 1.0 first. 2.0 will contain some breaking changes, but not as much as 1.0.

We're currently somewhat stuck at #2476 but after this is done we would like to release 2.0. This can take a week to a month, there's no fixed ETA.

pat-s commented 11 months ago

WRT to k8s specifically: I'd say it's useable for production needs, at least I do so across many instances and a few dozen repos with some complex configs.

maltegrosse commented 11 months ago

sounds nice, thanks for the feedback @pat-s . the only point which confuses me are the resource limits. Seems like every step requires the resources definitions - or can I simple add it to any agent globally by using normal k8s syntax? (so it will be applied to any job) - see https://woodpecker-ci.org/docs/next/administration/backends/kubernetes#resources The reason is that i dont trust my users that they can assume / predict proper resource usage :)

Addionally, are there any breaking changes regarding my db (postgres) ? (currently using 0.15.6)

I couldnt find anything at https://woodpecker-ci.org/docs/next/migrations

qwerty287 commented 11 months ago

I couldnt find anything at https://woodpecker-ci.org/docs/next/migrations

This means there's nothing. Of course, some db migrations will run, but woodpecker handles this automatically on first start after update.

pat-s commented 11 months ago

Seems like every step requires the resources definitions

Yes, that's the case and probably won't change in the future.

or can I simple add it to any agent globally by using normal k8s syntax?

No, it must be added to each pipeline and their steps. Also, it's not about the runner (which is a separate deployment) but about the pods spawned by the runner. These are defined by the respective pipelines.

The reason is that i dont trust my users that they can assume / predict proper resource usage

By default, resources are not set (as it is the case for any other k8s resource). And yeah, teaching is needed. I feel you however, I have the same issues in my environment WRT to users ;)

maltegrosse commented 11 months ago

thank you @pat-s Have you played with Resource Quotas? Havent tried with it yet, but could somehow limit the damage :-)

pat-s commented 11 months ago

Seems like an interesting idea, maybe we can implement this in the helm chart, so it can be applied across the namespace WP is running in. Thanks for sharing the idea!

maltegrosse commented 11 months ago

And setting up a default resource definition for each step is not an option at all for WP?

if no resource defintion in workflow file
- check if global resource definition
- apply this, or
- fall back to no resource definition

As Resource Quotas mention: For cpu and memory resources, ResourceQuotas enforce that every (new) pod in that namespace sets a limit for that resource. If you enforce a resource quota in a namespace for either cpu or memory, you, and other clients, must specify either requests or limits for that resource, for every new Pod you submit. If you don't, the control plane may reject admission for that Pod.

or are Limit Ranges exactly to solve that issue? seems like if I look at the first example

6543 commented 11 months ago

close this as we shoudl have full support now - if there are still issues they are considerated normal bugs :)

maltegrosse commented 10 months ago

@pat-s I finally upgraded to wp2, on kubernetes, works great! (inlcuding resource limits for GPU)

resources are limited by LimitRange:

apiVersion: v1
kind: LimitRange
metadata:
  name: compute-limits
  namespace: woodpecker
spec:
  limits:
  - default: 
      cpu: 12
      memory: 40Gi
      nvidia.com/mig-2g.20gb: 1
    type: Container

thank you all again!

6543 commented 10 months ago

nice :heart:

I'll lock this issue as we now have kube support :) future interactions should be new issues.

woodpecker-ci / woodpecker

[Summary] Kubernetes backend support #1513

Current state: