kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.3k stars 1.05k forks source link

GitLab Runner Scaler #5616

Open Dentrax opened 6 months ago

Dentrax commented 6 months ago

Proposal

KEDA already supports GitHub Runners. As a self-hosted GitLab user, it'd be great to have support for GitLab Runners.

Scaler Source

https://docs.gitlab.com/runner/

Scaling Mechanics

https://docs.gitlab.com/runner/fleet_scaling/

Authentication Source

https://docs.gitlab.com/ee/ci/runners/configure_runners.html#authentication-token-security

Anything else?

No response

tomkerkhove commented 5 months ago

Good suggestion! Are you willing to contribute this?

stale[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 month ago

This issue has been automatically closed due to inactivity.

fira42073 commented 1 month ago

Hi, folks! C:

I'm willing to contribute to this one, but I'm not sure the implementation details are completely clear to me.

Do I understand it correctly that gitlab api shall be polled at this endpoint https://docs.gitlab.com/ee/api/runners.html#list-owned-runners for the count of the runners equivalent to https://github.com/kedacore/keda/blob/main/pkg/scalers/github_runner_scaler.go?

Thanks in advance!

JorTurFer commented 1 month ago

hey! Thanks! 🚀 I think that the correct endpoint is the pipeline one -> https://docs.gitlab.com/ee/api/pipelines.html But yeah, the idea is that KEDA will poll the queue length calling the API using a token

fira42073 commented 1 month ago

Hi, @JorTurFer! I've created a poc of this solution, could you please take a glance https://github.com/kedacore/keda/pull/6087/files? It's not ready (in terms of tests, helm charts, docs etc), but it does have some logic in there.

If it looks fine, I'll continue with the rest of the required parts.

Thanks in advance!

JorTurFer commented 4 weeks ago

I've taken a look and it looks nice. It's exactly how a scaler works :)

fira42073 commented 3 weeks ago

Hi! I've fixed the previous issues mentioned, and added some tests. Please take a look at your convenience <3

I had to add a new parser for *url.URL, but it wasn't very clear to me as to why there is this argument "params" in the function

func setConfigValueHelper(params Params, valFromConfig string, field reflect.Value) error {

in pkg/scalers/scalersconfig/typed_config.go

Seemingly it's passed through the recursion, but actually is not really used. Maybe I'm mistaken, I'm new to this part of the codebase.

zroubalik commented 3 weeks ago

Hi! I've fixed the previous issues mentioned, and added some tests. Please take a look at your convenience <3

I had to add a new parser for *url.URL, but it wasn't very clear to me as to why there is this argument "params" in the function

func setConfigValueHelper(params Params, valFromConfig string, field reflect.Value) error {

in pkg/scalers/scalersconfig/typed_config.go

Seemingly it's passed through the recursion, but actually is not really used. Maybe I'm mistaken, I'm new to this part of the codebase.

@wozniakjan FYI

wozniakjan commented 3 weeks ago

hi @fira42073, great job with https://github.com/kedacore/keda/pull/6087!

Regarding params in setConfigValueHelper(), you don't have to pass it to your setConfigValueURL(), the value setter function can have an arbitrary signature, it just happened to be the case that most of these need params so they take it :)

JorTurFer commented 3 weeks ago

@fira42073 , what do you need for the e2e tests? I'm a total newbie with Gitlab so IDK if we have to create a Gitlab account or if it runs locally and you can deploy an instance directly into the cluster for the e2e test or how we can include e2e tests

fira42073 commented 2 weeks ago

I think we can totally run it in the cluster. I'm using https://github.com/sameersbn/docker-gitlab?tab=readme-ov-file#quick-start for local testing, maybe it's a good option to use it for e2e tests as well.

The only part I'm not sure about is creating the actual pipelines and checking how they scale. I'm not sure if shared runners are available in self-hosted gitlab. Probably they are not, so we'd need to connect gitlab's runner system to cluster and let it provision new pods.

After that it should be easy to programmatically create a repo, push a file with .gitlab-ci.yml that will contain an instruction like sleep 300, so the runner is occupied with some pseudowork.

I'm not sure what would be the best option to increase the number of runners tbh. Probably just continuously committing some garbage data, so that it can trigger more pipelines every time would work.

p.s. sorry for late replies, I have my fulltime job during the weekdays, so I usually can contribute only on the weekends

fira42073 commented 2 weeks ago

Hey! I've taken a look at the existing e2e tests and I'm a bit puzzled.

As far I understood, I'll need to add some setup code to tests/utils/setup_test.go and tests/utils/cleanup_test.go. Here I should be able to initialize gitlab itself. So it probably makes more sense to use a helmchart to do that. I've discovered this one, and it looks quite good.

As far as I understand, I can use ExecuteCommand to do helm install for this gitlab chart. I'm a bit confused about how the decision of running the test or not is being concluded.

I there there are some values taken from the env:

    AzureRunWorkloadIdentityTests = os.Getenv("AZURE_RUN_WORKLOAD_IDENTITY_TESTS")
    AwsIdentityTests              = os.Getenv("AWS_RUN_IDENTITY_TESTS")
    GcpIdentityTests              = os.Getenv("GCP_RUN_IDENTITY_TESTS")

From which I can conclude that these values determine if the test will be run or not. But I couldn't figure out where they are actually being set.

The only other occurrence is in Makefile:deploy. But as far as I understand that's unrelated, because there are some other actions happening that are related somehow to kustomize.


I've figured that actually creating the instance using the chart is pretty easy:

Deploy

# # Instructions per https://docs.gitlab.com/charts/development/kind/#nginx-ingress-nodeport-with-ssl
helm repo add gitlab https://charts.gitlab.io/
helm repo update

# with ssl
kind create cluster --config examples/kind/kind-ssl.yaml
helm upgrade --install gitlab gitlab/gitlab \
  --set global.hosts.domain=local.gd \
  --set global.edition=ce \
  -f examples/kind/values-base.yaml \
  -f examples/kind/values-ssl.yaml

firefox http://gitlab.local.gd/

Get secret

username is root password is:

kubectl get secret <name>-gitlab-initial-root-password -ojsonpath='{.data.password}' | base64 --decode ; echo

# in case of previous commands
kubectl get secret gitlab-gitlab-initial-root-password -ojsonpath='{.data.password}' | base64 --decode ; echo

I figured out that the runners are already being installed as part of the chart and there is no need to install them separately, so the next instructions are strikedthrogh, because they are not relevant

Runner

Create CRDs

curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.28.0/install.sh | bash -s v0.28.0

kubectl create -f https://operatorhub.io/install/gitlab-runner-operator.yaml

### Registering a new PAT
```bash
# generate PAT
echo $RANDOM | shasum | head -c 30

# replace token
kubectl exec -it -c toolbox deploy/gitlab-toolbox -- gitlab-rails runner "token = User.find_by_username('root').personal_access_tokens.create(scopes: ['create_runner'], name: 'create_runner_pat', expires_at: 1.days.from_now); token.set_token('REPLACE_ME_WITH_PREVIOUS_STEP_VALUE'); token.save!"

# so it would result in something like 
kubectl exec -it -c toolbox deploy/gitlab-toolbox -- gitlab-rails runner "token = User.find_by_username('root').personal_access_tokens.create(scopes: ['create_runner'], name: 'create_runner_pat', expires_at: 1.days.from_now); token.set_token('b2c6552d23ed47548549ceceed07e2'); token.save!"

Create runner itself

---
apiVersion: v1
kind: Secret
metadata:
  name: gitlab-runner-secret
type: Opaque
# Only one of the following fields can be set. The Operator fails to register the runner if both are provided.
# NOTE: runner-registration-token is deprecated and will be removed in GitLab 18.0. You should use runner-token instead.
stringData:
  runner-token: b2c6552d23ed47548549ceceed07e2  # your project runner token from previous step
  # runner-registration-token: ""  # your project runner secret
---
apiVersion: apps.gitlab.com/v1beta2
kind: Runner
metadata:
  name: gitlab-runner
spec:
  gitlabUrl: https://gitlab.local.gd
  buildImage: alpine
  token: gitlab-runner-secret

Register runner

curl -k --request POST --header "PRIVATE-TOKEN: b2c6552d23ed47548549ceceed07e2" --data "runner_type=instance_type"  "https://gitlab.local.gd/api/v4/user/runners"

As per adding the runners I haven't still figured out how to do that properly, because I'm facing some issues.

This fails to connect runners to the gitlab instance, but that's probably related to dns, because gitlab-runner pod crashes with ERROR: Registering runner... failed runner=JD6wIp1U status=couldn't execute POST against https://gitlab.local.gd/api/v4/runners: Post "https://gitlab.local.gd/api/v4/runners": dial tcp 127.0.0.1:443: connect: connection refused

I've stumbled upon this project, but it looks dead. Also it's probably better to use real runners instead of mocking them. After all, it's e2e C:

But I can imagine after filling that gap, I can already create a ScaledObject that will reference runners deployment.

Then I somehow need to trigger several pipelines, that won't stop immediately, but will hang in there for some time, while tests are being run.

Mock pipeline:

.gitlab-ci.yml

# This file is a template, and might need editing before it works on your project.
# This is a sample GitLab CI/CD configuration file that should run without any modifications.
# It demonstrates a basic 3 stage CI/CD pipeline. Instead of real tests or scripts,
# it uses echo commands to simulate the pipeline execution.
#
# A pipeline is composed of independent jobs that run scripts, grouped into stages.
# Stages run in sequential order, but jobs within stages run in parallel.
#
# For more information, see: https://docs.gitlab.com/ee/ci/yaml/index.html#stages
#
# You can copy and paste this template into a new `.gitlab-ci.yml` file.
# You should not add this template to an existing `.gitlab-ci.yml` file by using the `include:` keyword.
#
# To contribute improvements to CI/CD templates, please follow the Development guide at:
# https://docs.gitlab.com/ee/development/cicd/templates.html
# This specific template is located at:
# https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/ci/templates/Getting-Started.gitlab-ci.yml
stages:          # List of stages for jobs, and their order of execution
  - build
  - test
  - deploy
build-job:       # This job runs in the build stage, which runs first.
  stage: build
  script:
    - echo "Compiling the code..."
    - echo "Compile complete."
unit-test-job:   # This job runs in the test stage.
  stage: test    # It only starts when the job in the build stage completes successfully.
  script:
    - echo "Running unit tests... This will take about 60 seconds."
    - sleep 60
    - echo "Code coverage is 90%"
lint-test-job:   # This job also runs in the test stage.
  stage: test    # It can run at the same time as unit-test-job (in parallel).
  script:
    - echo "Linting code... This will take about 10 seconds."
    - sleep 10
    - echo "No lint issues found."
deploy-job:      # This job runs in the deploy stage.
  stage: deploy  # It only runs when *both* jobs in the test stage complete successfully.
  environment: production
  script:
    - echo "Deploying application..."
    - echo "Application successfully deployed."

I'll keep you posted on the progress.

fira42073 commented 2 weeks ago

Okay, my first clue about the dns was right, the pod was trying to reach its own localhost, so that was failing. Fixing the issue was as easy as substituting local.gd with nip.io and my own local ip address. It's not clear to me how this is going to work in CI for the e2e test.

The rest of steps look roughly like this: Substitute https://gitlab.192.168.1.165.nip.io with your local IP address if running this all locally. Again, not sure how this is going to look in the CI for e2e.

Deploy

# # Instructions per https://docs.gitlab.com/charts/development/kind/#nginx-ingress-nodeport-with-ssl
helm repo add gitlab https://charts.gitlab.io/
helm repo update

# with ssl
kind create cluster --config examples/kind/kind-ssl.yaml
helm upgrade --install gitlab gitlab/gitlab \
  --set global.hosts.domain=192.168.1.165.nip.io \
  --set global.edition=ce \
  -f examples/kind/values-base.yaml \
  -f examples/kind/values-ssl.yaml

firefox https://gitlab.192.168.1.165.nip.io/

Here we also need a mechanism to check that gitlab is up and the api is not returning 502 anymore. Regular polling every 10 seconds or so might work well.

(Optional) Get secret to access ui on https://gitlab.192.168.1.165.nip.io/

username is root password is:

kubectl get secret <name>-gitlab-initial-root-password -ojsonpath='{.data.password}' | base64 --decode ; echo

# in case of previous commands
kubectl get secret gitlab-gitlab-initial-root-password -ojsonpath='{.data.password}' | base64 --decode ; echo

Create repo and add ci file

# create personal access token
kubectl exec -it -c toolbox deploy/gitlab-toolbox -- gitlab-rails runner "token = User.find_by_username('root').personal_access_tokens.create(scopes: ['read_repository', 'write_repository', 'api', 'read_api'], name: 'create_runner_pat', expires_at: 1.days.from_now); token.set_token('static_pat_123'); token.save!"

# new repo
curl -k --request POST --header "PRIVATE-TOKEN: static_pat_123" \
     --header "Content-Type: application/json" --data '{
        "name": "new_project", "description": "New Project", "path": "new_project", "initialize_with_readme": "false"}' \
     --url "https://gitlab.192.168.1.165.nip.io/api/v4/projects/"

# add .gitlab-ci.yml file
curl -k --request POST \
--header 'PRIVATE-TOKEN: static_pat_123' \
--header "Content-Type: application/json" \
--data "{\"branch\": \"main\", \"author_email\": \"author@example.com\", \"author_name\": \"Firstname Lastname\", \"content\": \"stages: [deploy]\\ndeploy-job:\\n  stage: deploy\\n  script: [\\\"sleep 600\\\"]\", \"commit_message\": \"create a new file\"}" \
--url "https://gitlab.192.168.1.165.nip.io/api/v4/projects/1/repository/files/.gitlab-ci.yml"
fira42073 commented 2 weeks ago

I guess currently there are a couple of questions to answer:

These are the challenging points I see for now. Please let me know if you have any ideas about overcoming these.

Thanks in advance! <3

JorTurFer commented 2 weeks ago

Hello Sorry for the slow response :(

  • how to actually perform all of these setup actions for the cases when gitlab-runner is tested in the e2e tests.

I think that gitlab is something scoped within the test context, so I'd spin up the gitlab env during in the beginning of the test and just delete it at the end rather than doing that with global setup/cleanup.

  • how to determine which local ip to use in the instance, since localhost would not work

Isn't there a kubernetes service that you can use?

  • figure out if this setup is going to work on the e2e test runners that are currently being used, because this was tested using KinD. there may be additional challenges with the real ci runner.

In theory based on docs, we should be able to spin up a real GitLab runner and connect it to the GitLab instance -> https://docs.gitlab.com/runner/. In any case, if the changes are uploaded, I can trigger the e2e test in our testing cluster and check how it goes

fira42073 commented 2 weeks ago

Hey! No rush C:

I think that gitlab is something scoped within the test context, so I'd spin up the gitlab env during in the beginning of the test and just delete it at the end rather than doing that with global setup/cleanup.

If some other scaler already does such setup, could you please refer me to where I can see that?

Isn't there a kubernetes service that you can use?

That's the thing C: It creates its own signed certificate, and I need to use some kind of fixed host. I'm using nip.io, which is a dns server that returns localhost for localhost.nip.io or 192.168.0.1 for 192.168.0.1.nip.io. It creates a fqdn that can be signed using self-issued certificate.

In theory based on docs, we should be able to spin up a real GitLab runner and connect it to the GitLab instance -> https://docs.gitlab.com/runner/. In any case, if the changes are uploaded, I can trigger the e2e test in our testing cluster and check how it goes

Yeah, checking it on the end cluster where it will be running will definitely be helpful!

JorTurFer commented 2 weeks ago

If some other scaler already does such setup, could you please refer me to where I can see that?

Sure! This example creates the selenium hub from scratch using raw manifests and this other example deploys IBMMQ server using helm

It creates its own signed certificate, and I need to use some kind of fixed host. I'm using nip.io, which is a dns server that returns localhost for localhost.nip.io or 192.168.0.1 for 192.168.0.1.nip.io. It creates a fqdn that can be signed using self-issued certificate

Yeah, but you can use the service as fixed host if the backend exposes TLS. I mean, you can access to the workload via service calling to https://service_name.namespace. If you set the host as service_name.namespace, the caller will accept the served cert for service_name.namespace

fira42073 commented 2 weeks ago

Thanks for the examples, I'll look into how I can replicate a similar setup for gitlab as well. Looks easy.

Yeah, but you can use the service as fixed host if the backend exposes TLS. I mean, you can access to the workload via service calling to https://service_name.namespace. If you set the host as service_name.namespace, the caller will accept the served cert for service_name.namespace

This is probably the best solution in this context, for some reason I didn't even think of it. Using k8s service fqdn totally works.

I'll do my best to carve out some time for it this weekend. Thanks!

JorTurFer commented 2 weeks ago

I'll do my best to carve out some time for it this weekend. Thanks!

No rush at all. We really appreciate contributions, but OSS is OSS. Take your time and don't be presured about any expected date :)