taoyang1122 / adapt-image-models

[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition
Apache License 2.0
278 stars 21 forks source link

Maybe you should set `find_unused_parameters=True` in you config file? #11

Closed leexinhao closed 1 year ago

leexinhao commented 1 year ago

Such as the title, otherwise you will encounter errors in DDP training. image

taoyang1122 commented 1 year ago

@leexinhao Hi, I didn't have this problem during training. Could you please let me know your environments and training scripts?

leexinhao commented 1 year ago

@leexinhao Hi, I didn't have this problem during training. Could you please let me know your environments and training scripts?

I have reproduced this problem in torch1.7.0 and torch1.12.0, and I use the slurm_train.sh, it also mentioned by mmaction2 documention (adapt-image-models\docs\faq.md: image

I am curious that why didn't you encounter this problem.

taoyang1122 commented 1 year ago

I am using dist_train.sh as shown in https://github.com/taoyang1122/adapt-image-models/blob/main/run_exp.sh. Will you have the problem using this?

leexinhao commented 1 year ago

I am using dist_train.sh as shown in https://github.com/taoyang1122/adapt-image-models/blob/main/run_exp.sh. Will you have the problem using this?

It also happens, maybe it's caused by my environment of mmaction2, it's not a big deal, I have reproduced your results. Thanks for your excellent work!