Closed wqf321 closed 2 years ago
Hi there,
Thanks for your interests in our HERO project. We did not run into this issue during our experiments. I see you are using your own virtual environment, which might be where the discrepancies come from in your experiment.
From a first glance, there is one tensor that has different data type across ranks (unit 8 vs. int64). My suggestion is to find out which tensor it is exactly from "model/pretrain.py, line 452" and cast it into the same data type across all ranks.
Closed due to inactivity
hi, i got a problem when run the command " horovodrun -np 2 python pretrain.py --config config/pretrain-tv-16gpu.json --output_dir ./pre_train_ckpt/ckpt/ ", could you please help me?