lessw2020 / Ranger21

Ranger deep learning optimizer rewrite to use newest components
Apache License 2.0
320 stars 44 forks source link

Multi GPU problem #17

Open zsgj-Xxx opened 3 years ago

zsgj-Xxx commented 3 years ago

Hi I think I'm having a new problem I've compared Ranger with Ranger 21 on a fine-grained dataset, but Ranger 21's results are much worse than Ranger's. I do get exciting results on my own computer, but the results on a multi-card server are poor. Do you know why?

Ranger net_top11

Ranger21 net_top1

zsgj-Xxx commented 3 years ago

All the settings are the same except for the optimizer

lessw2020 commented 3 years ago

Hi @zsgj-Xxx, Thanks for opening the issue. We have not had a chance to test Ranger21 out on multi-gpu yet, but for sure some aspects of it need to be adjusted in order to run properly (primarily b/c Ranger21 includes an lr scheduler internally), so performance will not be good on multi-gpu vs single gpu. Ranger (original Ranger) should operate w/o issue on multi-gpu and that's reflected in part by the better perf for multi-gpu here. I'm finishing up some testing on a new feature for Ranger21 today and then will try to setup a multi-gpu scenario to get it optimized for handling this. I'll leave this issue open for now to track it. Thanks!

ryanstout commented 3 years ago

@lessw2020 Great project, thanks for all of the hard work on it! I'm seeing similar issues. Interestingly, when switching to multiple gpu's (even multiplying the LR by the number of GPU's), the loss doesn't drop any faster. (Steps or wall time). Any ideas on why that would be? Thanks!

rsomani95 commented 2 years ago

Hey @lessw2020, Curious if you had a chance to work on this further? :)

@ryanstout This discussion is probably of interest to you: https://github.com/lessw2020/Ranger21/discussions/4#discussioncomment-826453