sql-machine-learning / elasticdl

Kubernetes-native Deep Learning Framework
https://elasticdl.org
MIT License
731 stars 113 forks source link

Allow executing arbitrary commands before training starts #1039

Closed terrytangyuan closed 5 years ago

terrytangyuan commented 5 years ago

Currently we need to convert ODPS data to RecordIO files before training starts. We need to generate and save the data in a shared storage where worker pods have access to. An solution would be performing the data conversion in one of the pods and once it's finished we start the training tasks.

cc: @ywskycn

ywskycn commented 5 years ago

@terrytangyuan like preforming some init tasks before starting the training? This can be done in the master pod, right? For example, now we can ask the master pod to do it locally, or trigger a spark job in a remote cluster to convert. And worker pods will be launched after the initTask finishes.

terrytangyuan commented 5 years ago

Yes exactly

ywskycn commented 5 years ago

LGTM. Let's sync with others in the next standup.

terrytangyuan commented 5 years ago

1106 supersedes of this. Closing this for now.