litian96 / FedProx

Federated Optimization in Heterogeneous Networks (MLSys '20)
MIT License
642 stars 157 forks source link

problems when run shakespeare and sent140 #2

Closed Mlair77 closed 5 years ago

Mlair77 commented 5 years ago

Dear Tian: when i run below on CPU: python3 -u main.py --dataset='sent140' --optimizer='fedprox' \ --learning_rate=0.01 --num_rounds=200 --clients_per_round=10 \ --mu=0 --eval_every=1 --batch_size=10 \ --num_epochs=1 \ --model='stacked_lstm' | tee logs/‘logs_sent140_mu0_E1_fedprox’

it runs very very slow,and the worst is the outputs are the same numbers! Result is below: 5726 Clients in Total Training with 10 workers --- At round 0 accuracy: 0.4060871469235822 At round 0 training accuracy: 0.40770690942001303 At round 0 training loss: 0.6931471925528921 gradient difference: 0.3779687893000023 At round 1 accuracy: 0.5939128530764178 At round 1 training accuracy: 0.5922930905799869 At round 1 training loss: 0.682659032131717 gradient difference: 0.6406151359028104 At round 2 accuracy: 0.4060871469235822 At round 2 training accuracy: 0.40770690942001303 At round 2 training loss: 0.6951613189004014 gradient difference: 1.0240842395041418 At round 3 accuracy: 0.5939128530764178 At round 3 training accuracy: 0.5922930905799869 At round 3 training loss: 0.6845133630735032 gradient difference: 1.334649037607692 At round 4 accuracy: 0.4060871469235822 At round 4 training accuracy: 0.40770690942001303 At round 4 training loss: 0.7872438000397856 gradient difference: 3.8706158347478246 At round 5 accuracy: 0.5939128530764178 At round 5 training accuracy: 0.5922930905799869 At round 5 training loss: 0.676954747225743 gradient difference: 2.8532703690523324 At round 6 accuracy: 0.4060871469235822 At round 6 training accuracy: 0.40770690942001303 At round 6 training loss: 0.6952778442305486 gradient difference: 2.9297919740883964 At round 7 accuracy: 0.5939128530764178 At round 7 training accuracy: 0.5922930905799869 At round 7 training loss: 0.7021283723042158 gradient difference: 4.2864026772781 At round 8 accuracy: 0.5939128530764178 At round 8 training accuracy: 0.5922930905799869 At round 8 training loss: 0.6761318949424154 gradient difference: 4.987087255237341 At round 9 accuracy: 0.4060871469235822 At round 9 training accuracy: 0.40770690942001303 At round 9 training loss: 0.8113437744137745 gradient difference: 9.235964830922306 At round 10 accuracy: 0.5939128530764178 At round 10 training accuracy: 0.5922930905799869 At round 10 training loss: 0.7755919640498169 gradient difference: 6.982072813031079 At round 11 accuracy: 0.5939128530764178 At round 11 training accuracy: 0.5922930905799869 At round 11 training loss: 0.7091725448816267 gradient difference: 6.115867566149534 At round 12 accuracy: 0.5939128530764178 At round 12 training accuracy: 0.5922930905799869 At round 12 training loss: 0.7398191231275261 gradient difference: 7.72441549160035 At round 13 accuracy: 0.5939128530764178 At round 13 training accuracy: 0.5922930905799869 At round 13 training loss: 1.0417891773572328 gradient difference: 15.32712477985914

And the same result happened when i run shakespeare. But mnist and nist performs good. how can i solve this? is there something wrong of stacked_lstm?

litian96 commented 5 years ago

For non-convex models, maybe you need to tune the hyper-parameters (step size, etc) a bit to make it converge better. We report all hyper-parameters in the appendix of the draft. The config. you are using is for synthetic data.