salesforce / fast-influence-functions

BSD 3-Clause "New" or "Revised" License
82 stars 18 forks source link

The command of starting the parallel variant? #13

Closed LinkToPast1990 closed 3 years ago

LinkToPast1990 commented 3 years ago

Hi, @HanGuo97 I tried to use the parallel variant through _compute_influencessimplified in influence_helpers.py. I used the command python run_experiments.py <exp_name> while the speed becomes even slower.

Do I need to run the run_experiments.py adding some prefix like python -m torch.distributed.launch --nproc_per_node 4 run_experiments.py <exp_name> ?

Thanks!

HanGuo97 commented 3 years ago

Thanks for reaching out!

So by "slower," could you specify what you are comparing to?

I believe there are variables (e.g., USE_PARALLEL) you need to set to use the parallel variant -- which is not enabled by default in run_experiments.py. If you are looking for ways to verify whether you are using multiple GPUs, you can use nvidia-smi or take a look at the logs (logs/).

If everything looks right and you are still getting low speed -- there could be other factors that have an impact on the speed. The one that I can think of is the overhead in spinning up multiple processes. Factors that could contribute to this include hardware config (I guess?) and the dataset's size (since the main process will send data to child processes, hence large dataset requires more time to send them). This overhead is usually reasonable, but you might want to take a closer look in some special cases.