Closed LinkToPast1990 closed 3 years ago
Thanks for reaching out!
So by "slower," could you specify what you are comparing to?
I believe there are variables (e.g., USE_PARALLEL
) you need to set to use the parallel variant -- which is not enabled by default in run_experiments.py
. If you are looking for ways to verify whether you are using multiple GPUs, you can use nvidia-smi
or take a look at the logs (logs/
).
If everything looks right and you are still getting low speed -- there could be other factors that have an impact on the speed. The one that I can think of is the overhead in spinning up multiple processes. Factors that could contribute to this include hardware config (I guess?) and the dataset's size (since the main process will send data to child processes, hence large dataset requires more time to send them). This overhead is usually reasonable, but you might want to take a closer look in some special cases.
Hi, @HanGuo97 I tried to use the parallel variant through _compute_influencessimplified in influence_helpers.py. I used the command
python run_experiments.py <exp_name>
while the speed becomes even slower.Do I need to run the run_experiments.py adding some prefix like
python -m torch.distributed.launch --nproc_per_node 4 run_experiments.py <exp_name>
?Thanks!