iterative / cml

♾️ CML - Continuous Machine Learning | CI/CD for ML
http://cml.dev
Apache License 2.0
4k stars 338 forks source link

Self-hosted cml-runner with singularity container #1387

Closed RaghavaAlajangi closed 1 year ago

RaghavaAlajangi commented 1 year ago

I have been trying to create self-hosted cml runner on the GPU-cluster with singularity.

step:1- build the singularity image singularity build cml_runner.sif dvcorg/cml:latest

step:2- execute the image with custom command (cml runner)

singularity exec self_cml_runner.sif cml runner launch --repo=$REPO_URL --token=$REPO_TOKEN --labels="cml_runner" --idle-timeout=10h --driver="gitlab"

But, I get the following error: error: EOVERFLOW: value too large for defined data type, chmod '/home/runner' {"code":"EOVERFLOW","errno":-75,"path":"/home/runner","syscall":"chmod"}

I found on the web that it might be related to file system limitations of cml docker image. Does anybody try to implement cml-runner with singularity and faced the problem like this? if so, could you share some ideas to solve this issue?

dacbd commented 1 year ago

@RaghavaAlajangi I don't think i have seen someone use singularity before and I am unfamiliar with it.

Perhaps you can manually create the cml user and the /home/runner directory instead of inheriting it from the docker image. Is this an option with the singularity build ... command?

RaghavaAlajangi commented 1 year ago

@dacbd thank you for the reply. What I found was that I was trying to obtain the root shell in a non-root instance. So, I followed the steps from this link and managed to launch the cml runner.

However, I get the following error in the gitlab CI pipeline:

{"level":"error","message":"Unauthorized","stack":"Error: Unauthorized\n    at Gitlab.request (/usr/lib/node_modules/@dvcorg/cml/src/drivers/gitlab.js:565:13)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async /usr/lib/node_modules/@dvcorg/cml/src/drivers/gitlab.js:54:16\n    at async Promise.all (index 0)\n    at async Gitlab.repoBase (/usr/lib/node_modules/@dvcorg/cml/src/drivers/gitlab.js:45:27)\n    at async Gitlab.projectPath (/usr/lib/node_modules/@dvcorg/cml/src/drivers/gitlab.js:33:22)\n    at async Gitlab.commitPrs (/usr/lib/node_modules/@dvcorg/cml/src/drivers/gitlab.js:108:25)\n    at async parseCommentTarget (/usr/lib/node_modules/@dvcorg/cml/src/commenttarget.js:39:25)\n    at async CML.commentCreate (/usr/lib/node_modules/@dvcorg/cml/src/cml.js:289:20)\n    at async Object.exports.handler (/usr/lib/node_modules/@dvcorg/cml/bin/cml/comment/create.js:11:15)"}

I created access token with enough permissions but still its throwing this error.

dacbd commented 1 year ago

@RaghavaAlajangi can you test your token manually with the version endpoint? https://docs.gitlab.com/ee/api/version.html

RaghavaAlajangi commented 1 year ago

@dacbd I tested the token. It seems there is no problem with it.

{"version":"15.11.8","revision":"6d1881b6091"}
RaghavaAlajangi commented 1 year ago

@dacbd I ran CML runner with personal access token and project access token, and assigned different roles (owner, maintainer, developer). Still, it is throwing me the same error. could you tell me how can I resolve this issue?

dacbd commented 1 year ago

@RaghavaAlajangi can you double check that you gave the personal access token the correct permissions? https://cml.dev/doc/self-hosted-runners?tab=GitLab#personal-access-token

RaghavaAlajangi commented 1 year ago

Do I have to also save the access token as a variable in CICD?

RaghavaAlajangi commented 1 year ago

@RaghavaAlajangi can you double check that you gave the personal access token the correct permissions? https://cml.dev/doc/self-hosted-runners?tab=GitLab#personal-access-token

@dacbd I checked the permissions and ran the job again but the still issue persists.

$ cml comment create report.md
{"level":"error","message":"Unauthorized","stack":"Error: Unauthorized\n    at Gitlab.request (/usr/lib/node_modules/@dvcorg/cml/src/drivers/gitlab.js:565:13)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async /usr/lib/node_modules/@dvcorg/cml/src/drivers/gitlab.js:54:16\n    at async Promise.all (index 0)\n    at async Gitlab.repoBase (/usr/lib/node_modules/@dvcorg/cml/src/drivers/gitlab.js:45:27)\n    at async Gitlab.projectPath (/usr/lib/node_modules/@dvcorg/cml/src/drivers/gitlab.js:33:22)\n    at async Gitlab.commitPrs (/usr/lib/node_modules/@dvcorg/cml/src/drivers/gitlab.js:108:25)\n    at async parseCommentTarget (/usr/lib/node_modules/@dvcorg/cml/src/commenttarget.js:39:25)\n    at async CML.commentCreate (/usr/lib/node_modules/@dvcorg/cml/src/cml.js:289:20)\n    at async Object.exports.handler (/usr/lib/node_modules/@dvcorg/cml/bin/cml/comment/create.js:11:15)"}
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit status 1
dacbd commented 1 year ago

Do I have to also save the access token as a variable in CICD?

yes, CML needs to be able to access the token. I'm not sure what else I can do to help you. There is something wrong with the token you are using.

You can try running curl command from your cicd script curl --header "PRIVATE-TOKEN: $REPO_TOKEN" "https://gitlab.example.com/api/v4/version"

RaghavaAlajangi commented 1 year ago

@dacbd The token name should be REPO_TOKEN but I saved it with a different name that causes the problem. It was my bad. Now, its fixed and running well. Thank you for the help.