issues
search
sustainable-computing-io
/
kepler-metal-ci
Testing different CI and Github Action pipelines and publish test results
https://sustainable-computing-io.github.io/kepler-metal-ci/
Apache License 2.0
0
stars
11
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
fix(add-train-logs) Add .log extension to train logs
#334
KaiyiLiu1234
closed
5 hours ago
0
feat(upload-train-logs): Upload train logs to repo
#333
KaiyiLiu1234
closed
8 hours ago
0
chore: set validator runtime
#332
rootfs
opened
9 hours ago
1
Validation with Model Sever fails due to Estimator
#331
KaiyiLiu1234
opened
1 day ago
0
feat(prometheus_snapshot): Upload prometheus snapshot for trainer workflows
#330
KaiyiLiu1234
closed
10 hours ago
1
chore: add support for kubeburner in PR check
#329
SamYuan1990
opened
1 day ago
1
chore: pre install crio and pytorch images to save test image
#328
rootfs
closed
2 days ago
1
don't re-install crio if it is already there
#327
rootfs
closed
2 days ago
0
chore(deps): Bump aws-actions/configure-aws-credentials from 2 to 4 in the github-actions group
#326
dependabot[bot]
closed
2 days ago
1
[Feat][Chore] Pr level protect
#325
SamYuan1990
closed
2 days ago
0
fix(enable_ssh_file_access): Remove enforce
#324
KaiyiLiu1234
closed
2 days ago
1
Update equinix_k8s_kepler_action.yml: don't start on every PR, due to…
#323
rootfs
closed
5 days ago
1
chore: use the right dcgm output
#322
rootfs
closed
5 days ago
0
chore: install nvidia dgcm in ami
#321
rootfs
closed
5 days ago
0
feat: add GPU operation
#320
rootfs
closed
5 days ago
1
Creating AMI with NVIDIA driver
#319
rootfs
closed
5 days ago
0
AWS GPU CI support
#318
rootfs
closed
5 days ago
0
chore: add GHA to create aws ec2 ami with centos stream 9 and nvidia driver
#317
rootfs
closed
5 days ago
0
fix(exclude-sweapper-process): Exclude swapper process on metal kepler
#316
KaiyiLiu1234
closed
6 days ago
0
feat(aws): Incorporate AWS fully into Metal CI
#315
KaiyiLiu1234
closed
2 days ago
2
chore: add aws metal e2e
#314
rootfs
closed
1 week ago
0
[Nit] code refactor
#313
SamYuan1990
closed
2 days ago
2
chore(validate_model_defaults): Added Polynomial and Xgboost to default list of target models
#312
KaiyiLiu1234
closed
1 week ago
0
chore: add model train and validate report
#311
rootfs
closed
1 week ago
1
chore: make the model server image tag a param
#310
rootfs
closed
1 week ago
0
chore(model-server): Use hatch to deploy Model Server
#309
KaiyiLiu1234
closed
2 weeks ago
2
Update equinix_k8s_flow_churncheck.yml
#308
rootfs
closed
2 weeks ago
0
chore: ignore model server start error
#307
rootfs
closed
2 weeks ago
0
chore: use the latest model server image tag
#306
rootfs
closed
2 weeks ago
1
chore: use vm-train option to train models
#305
rootfs
closed
2 weeks ago
1
Stressng script is being run on bm and not in the VM
#304
KaiyiLiu1234
opened
2 weeks ago
0
fix(vm_metrics): Add --vm-train tag
#303
KaiyiLiu1234
closed
3 weeks ago
0
fix(disable_daily_vals): Disable daily validations
#302
KaiyiLiu1234
closed
3 weeks ago
0
fix(switch_to_main): Switched model server branch to main repo
#301
KaiyiLiu1234
closed
3 weeks ago
0
Switch Model Server Branch for Trainer to main
#300
KaiyiLiu1234
closed
2 weeks ago
0
Update "list of models to train" input for e2e workflows
#299
KaiyiLiu1234
opened
3 weeks ago
0
validator fails because of lack of idle metrics
#298
KaiyiLiu1234
closed
2 weeks ago
1
feat(vm_metrics): Add vm metrics training process
#297
KaiyiLiu1234
closed
3 weeks ago
0
chore(ci): ensure workflow fails on errors
#296
vprashar2929
closed
3 weeks ago
1
chore: not expose estimated idle power
#295
rootfs
closed
3 weeks ago
0
feat(train_logs): Upload Training Logs
#294
KaiyiLiu1234
closed
1 month ago
0
Add more thorough log checks to see if the model server failed deployment during validation
#293
KaiyiLiu1234
opened
1 month ago
3
Push to Github via dependabot should be a composite action
#292
KaiyiLiu1234
opened
1 month ago
1
increasing timeout so that churn can continue
#291
shashank-boyapally
closed
1 month ago
0
feat: Add support for deploying Kepler using Compose
#290
vprashar2929
closed
3 weeks ago
2
feat: add timestamp to validator directory
#289
vprashar2929
closed
1 month ago
0
add error handle for action steps
#288
SamYuan1990
closed
1 month ago
1
chore(deps): bump the github-actions group across 1 directory with 2 updates
#287
dependabot[bot]
closed
1 week ago
2
chore(deps): bump the github-actions group with 3 updates
#286
dependabot[bot]
closed
1 month ago
1
[CI] Hot fix for actions/upload-artifact and actions/download-artifact
#285
SamYuan1990
closed
1 month ago
1
Next