Closed Andrew-Su-0718 closed 1 month ago
When I submit a pytorchjob with arena, I could't find parameters related to shared memory size, which is very important for pytorch training.
The size is fixed to 2Gi.
... - mountPath: /dev/shm name: dshm ... ... - emptyDir: medium: Memory sizeLimit: 2Gi name: dshm ...
Can anyone know how to set dshm size?
OK. I find a workaround solution. Modified file /charts/pytorchjob/values.yaml :
shmSize: 2Gi
to
shmSize: 64Gi # or any value you want
Same issue
/assign
When I submit a pytorchjob with arena, I could't find parameters related to shared memory size, which is very important for pytorch training.
The size is fixed to 2Gi.
Can anyone know how to set dshm size?