kubeflow / pytorch-operator

PyTorch on Kubernetes
Apache License 2.0
306 stars 143 forks source link

add cpu/mem resource limit to worker init container causes unmarshal error #236

Closed xrmzju closed 4 years ago

xrmzju commented 4 years ago

i create a configmap to customize the worker init container(add cpu/mem limits)

~/root(master) # kubectl -nkubeflow  exec -it pytorch-operator-6df7b4c9c4-47qb2 cat /etc/config/initContainer.yaml                    
- name: init-pytorch
  image: alpine:3.10
  imagePullPolicy: IfNotPresent
  resources:
    limits:
      cpu: "100m"
      memory: "100Mi"
  command: ['sh', '-c', 'until nslookup {{.MasterAddr}}; do echo waiting for master; sleep 2; done;']

and the pytorch controller failed to create worker pod with errors:

cannot unmarshal !!str `100m` into resource.Quantity

and i googled it and found this issue, so i tried replace the "gopkg.in/yaml.v2" with "github.com/kubernetes-sigs/yaml" and everything works fine. should we replace it in upstream

xrmzju commented 4 years ago

@gaocegege

gaocegege commented 4 years ago

I think we should.

/cc @johnugeorge

@xrmzju Are you interested in it? I'd appreciate if you could fix it.

xrmzju commented 4 years ago

I think we should.

/cc @johnugeorge

@xrmzju Are you interested in it? I'd appreciate if you could fix it.

i'll take it. and by the way, should we replace the godep with go mod?

gaocegege commented 4 years ago

@xrmzju We have the plan but I did not have the bandwidth for it. If you have time, please replace it with go mod. Thanks.

xrmzju commented 4 years ago

@xrmzju We have the plan but I did not have the bandwidth for it. If you have time, please replace it with go mod. Thanks.

sure, i have already replace it in my own repo. i'll open the pr soon.

johnugeorge commented 4 years ago

Sounds good to me. we can do go mod separately. we have to do it for other repos too.

gaocegege commented 4 years ago

Agree.

we can do go mod separately. we have to do it for other repos too.