a trivial question about kale (or argo) resource allocation

kubeflow-kale / kale

Kubeflow’s superfood for Data Scientists

Apache License 2.0

628 stars 129 forks source link

Hello, thank you for your great work

I faced the issue when I deployed my own pipeline with kale. it's a small gan network, trained with 10k images 256x256 I just executed the jupyter notebook, and it worked fine with the notebook on kubeflow 4cpu and 8Gi memory are allocated to the notebook

but in the meantime I got the pipeline started with kale, the pod where train function is in it is killed with OOM

I found that, only 128Mi of memory is allocated to the pod in which the train function is allocated Limits: cpu: 1 memory: 2Gi Requests: cpu: 100m memory: 128Mi

Can I rearrange the size of memory to be allocated to the pod before the pipeline gets run? Is there any way I can fix the resource to be allocated to each pipeline pods with kale?

kubeflow-kale / kale

a trivial question about kale (or argo) resource allocation #368