issues
search
kubeflow
/
pytorch-operator
PyTorch on Kubernetes
Apache License 2.0
307
stars
143
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Separate cluster scoped and namespace scoped resources
#215
johnugeorge
opened
5 years ago
1
PyTorchJob 1.0
#214
johnugeorge
opened
5 years ago
2
Minor change in log
#213
johnugeorge
closed
5 years ago
3
Delete v1beta2 code
#212
johnugeorge
closed
5 years ago
3
Add controller-name label for Pods and services
#211
johnugeorge
closed
5 years ago
6
Add qps and burst options
#210
ohmystack
closed
5 years ago
4
use the priority of kube-batch
#209
YesterdayxD
opened
5 years ago
11
Set pytorchjob defaults in test utils
#208
ohmystack
closed
5 years ago
4
Update codegen and verify in CI
#207
ohmystack
closed
5 years ago
4
Failed to deploy pytorch operator
#206
xiaqunfeng
closed
5 years ago
15
Can I use deployment.yaml in manifests directly
#205
wynn5a
closed
5 years ago
2
Common label changes with K8s upgrade to 1.12.3
#204
johnugeorge
closed
5 years ago
7
MPI distributed training job failed on master node with message "MPI process group does not support multi-GPU collectives" but succeed on worker node
#203
YYStreet
opened
5 years ago
2
NCCL backend did not start distributed training
#202
YYStreet
closed
5 years ago
4
add total suffix in counter metrics
#201
yeya24
closed
5 years ago
6
Update manifest to v0.6.0
#200
hougangliu
closed
5 years ago
10
Update manifest to latest
#199
hougangliu
closed
5 years ago
6
Use multi-build to build pytorch-operator image
#198
hmtai
closed
5 years ago
13
Update build.sh
#197
hmtai
closed
5 years ago
3
Update Dockerfile
#196
hmtai
closed
5 years ago
10
Use multi-stage build for pytorch operator Dockerfile
#195
hmtai
closed
5 years ago
4
feat: Support running although it is uesless
#194
gaocegege
closed
5 years ago
10
fix: Fix the comments
#193
gaocegege
closed
5 years ago
5
add kubeconfig flag
#192
yeya24
closed
5 years ago
7
Remove unnecessary services for worker
#191
hougangliu
closed
5 years ago
17
Integration into kubeflow pipeline
#190
miguelvr
opened
5 years ago
7
update release script; fix post submit
#189
johnugeorge
closed
5 years ago
3
update release script; fix post submit
#188
kunmingg
closed
5 years ago
7
use init container for worker pod to wait master pod ready
#187
zlcnju
closed
5 years ago
15
gang schedule bug
#186
zlcnju
closed
5 years ago
16
resolve test image conflict
#185
kunmingg
opened
5 years ago
3
Minor fix to add CoreV1 to scheme
#184
johnugeorge
closed
5 years ago
8
Implement "earlier" resource validation
#183
johanfleury
opened
5 years ago
3
Add documentation on RBAC authorizations
#182
johanfleury
closed
5 years ago
3
set annotation automatically when EnableGangScheduling is set to true…
#181
zlcnju
closed
5 years ago
9
adds sdk for pytorchjob from OpenAPI Spec
#180
swiftdiaries
closed
4 years ago
5
fix wrong api version when delete pytorchjob
#179
wackxu
closed
5 years ago
11
Moving crd to manifests
#178
johnugeorge
closed
5 years ago
3
Adds developer guide and sample CRD for v1
#177
krishnadurai
closed
5 years ago
6
Update image base to UBI8 GA
#176
johnugeorge
closed
5 years ago
4
PyTorch Operator Prometheus Metrics
#175
krishnadurai
closed
5 years ago
7
Prometheus Operator for Pytorch
#174
krishnadurai
closed
5 years ago
3
Skip condition update when succeeded
#173
johnugeorge
closed
5 years ago
2
Sync PodGroup fix
#172
johnugeorge
closed
5 years ago
3
Check pending status for pastBackoffLimitOnFailure
#171
johnugeorge
closed
5 years ago
2
Set start timestamp
#170
johnugeorge
closed
5 years ago
4
Making ResyncPeriod configurable
#169
johnugeorge
closed
5 years ago
2
add uuid to id for leader election
#168
fisherxu
closed
5 years ago
4
Polish documentation for PyTorch V1
#167
richardsliu
closed
5 years ago
3
Remove v1beta1 code
#166
johnugeorge
closed
5 years ago
4
Previous
Next