mlfoundations / open_clip

An open source implementation of CLIP.
Other
9.29k stars 923 forks source link

Example of launcher script with Horovod #823

Closed guillaumeguy closed 5 months ago

guillaumeguy commented 5 months ago

Hi there:

Is there a good script template to launch a training job with Horovod? I couldn't find one in ./scripts/ and the Horovod documentation use horovod.run (is it what you would recommend??).

Thanks!

https://github.com/horovod/horovod/blob/master/examples/pytorch/pytorch_mnist.py#L253

guillaumeguy commented 5 months ago

This worked for me:


if [ -z "$PYTHONPATH" ]; then
    export PYTHONPATH=$PWD
else
    PYTHONPATH=$PYTHONPATH:$PWD
    export PYTHONPATH
fi

echo "PYTHONPATH" $PYTHONPATH

python training/main.py \
--save-frequency 10 \
--save-most-recent \
--train-data "$TRAINING_DATA"  \
--val-data "$VAL_DATA"  \
...