Dear Tian:
when i run below on CPU:
python3 -u main.py --dataset='sent140' --optimizer='fedprox' \
--learning_rate=0.01 --num_rounds=200 --clients_per_round=10 \
--mu=0 --eval_every=1 --batch_size=10 \
--num_epochs=1 \
--model='stacked_lstm' | tee logs/‘logs_sent140_mu0_E1_fedprox’
it runs very very slow,and the worst is the outputs are the same numbers! Result is below:
5726 Clients in Total
Training with 10 workers ---
At round 0 accuracy: 0.4060871469235822
At round 0 training accuracy: 0.40770690942001303
At round 0 training loss: 0.6931471925528921
gradient difference: 0.3779687893000023
At round 1 accuracy: 0.5939128530764178
At round 1 training accuracy: 0.5922930905799869
At round 1 training loss: 0.682659032131717
gradient difference: 0.6406151359028104
At round 2 accuracy: 0.4060871469235822
At round 2 training accuracy: 0.40770690942001303
At round 2 training loss: 0.6951613189004014
gradient difference: 1.0240842395041418
At round 3 accuracy: 0.5939128530764178
At round 3 training accuracy: 0.5922930905799869
At round 3 training loss: 0.6845133630735032
gradient difference: 1.334649037607692
At round 4 accuracy: 0.4060871469235822
At round 4 training accuracy: 0.40770690942001303
At round 4 training loss: 0.7872438000397856
gradient difference: 3.8706158347478246
At round 5 accuracy: 0.5939128530764178
At round 5 training accuracy: 0.5922930905799869
At round 5 training loss: 0.676954747225743
gradient difference: 2.8532703690523324
At round 6 accuracy: 0.4060871469235822
At round 6 training accuracy: 0.40770690942001303
At round 6 training loss: 0.6952778442305486
gradient difference: 2.9297919740883964
At round 7 accuracy: 0.5939128530764178
At round 7 training accuracy: 0.5922930905799869
At round 7 training loss: 0.7021283723042158
gradient difference: 4.2864026772781
At round 8 accuracy: 0.5939128530764178
At round 8 training accuracy: 0.5922930905799869
At round 8 training loss: 0.6761318949424154
gradient difference: 4.987087255237341
At round 9 accuracy: 0.4060871469235822
At round 9 training accuracy: 0.40770690942001303
At round 9 training loss: 0.8113437744137745
gradient difference: 9.235964830922306
At round 10 accuracy: 0.5939128530764178
At round 10 training accuracy: 0.5922930905799869
At round 10 training loss: 0.7755919640498169
gradient difference: 6.982072813031079
At round 11 accuracy: 0.5939128530764178
At round 11 training accuracy: 0.5922930905799869
At round 11 training loss: 0.7091725448816267
gradient difference: 6.115867566149534
At round 12 accuracy: 0.5939128530764178
At round 12 training accuracy: 0.5922930905799869
At round 12 training loss: 0.7398191231275261
gradient difference: 7.72441549160035
At round 13 accuracy: 0.5939128530764178
At round 13 training accuracy: 0.5922930905799869
At round 13 training loss: 1.0417891773572328
gradient difference: 15.32712477985914
And the same result happened when i run shakespeare.
But mnist and nist performs good.
how can i solve this? is there something wrong of stacked_lstm?
For non-convex models, maybe you need to tune the hyper-parameters (step size, etc) a bit to make it converge better. We report all hyper-parameters in the appendix of the draft. The config. you are using is for synthetic data.
Dear Tian: when i run below on CPU: python3 -u main.py --dataset='sent140' --optimizer='fedprox' \ --learning_rate=0.01 --num_rounds=200 --clients_per_round=10 \ --mu=0 --eval_every=1 --batch_size=10 \ --num_epochs=1 \ --model='stacked_lstm' | tee logs/‘logs_sent140_mu0_E1_fedprox’
it runs very very slow,and the worst is the outputs are the same numbers! Result is below: 5726 Clients in Total Training with 10 workers --- At round 0 accuracy: 0.4060871469235822 At round 0 training accuracy: 0.40770690942001303 At round 0 training loss: 0.6931471925528921 gradient difference: 0.3779687893000023 At round 1 accuracy: 0.5939128530764178 At round 1 training accuracy: 0.5922930905799869 At round 1 training loss: 0.682659032131717 gradient difference: 0.6406151359028104 At round 2 accuracy: 0.4060871469235822 At round 2 training accuracy: 0.40770690942001303 At round 2 training loss: 0.6951613189004014 gradient difference: 1.0240842395041418 At round 3 accuracy: 0.5939128530764178 At round 3 training accuracy: 0.5922930905799869 At round 3 training loss: 0.6845133630735032 gradient difference: 1.334649037607692 At round 4 accuracy: 0.4060871469235822 At round 4 training accuracy: 0.40770690942001303 At round 4 training loss: 0.7872438000397856 gradient difference: 3.8706158347478246 At round 5 accuracy: 0.5939128530764178 At round 5 training accuracy: 0.5922930905799869 At round 5 training loss: 0.676954747225743 gradient difference: 2.8532703690523324 At round 6 accuracy: 0.4060871469235822 At round 6 training accuracy: 0.40770690942001303 At round 6 training loss: 0.6952778442305486 gradient difference: 2.9297919740883964 At round 7 accuracy: 0.5939128530764178 At round 7 training accuracy: 0.5922930905799869 At round 7 training loss: 0.7021283723042158 gradient difference: 4.2864026772781 At round 8 accuracy: 0.5939128530764178 At round 8 training accuracy: 0.5922930905799869 At round 8 training loss: 0.6761318949424154 gradient difference: 4.987087255237341 At round 9 accuracy: 0.4060871469235822 At round 9 training accuracy: 0.40770690942001303 At round 9 training loss: 0.8113437744137745 gradient difference: 9.235964830922306 At round 10 accuracy: 0.5939128530764178 At round 10 training accuracy: 0.5922930905799869 At round 10 training loss: 0.7755919640498169 gradient difference: 6.982072813031079 At round 11 accuracy: 0.5939128530764178 At round 11 training accuracy: 0.5922930905799869 At round 11 training loss: 0.7091725448816267 gradient difference: 6.115867566149534 At round 12 accuracy: 0.5939128530764178 At round 12 training accuracy: 0.5922930905799869 At round 12 training loss: 0.7398191231275261 gradient difference: 7.72441549160035 At round 13 accuracy: 0.5939128530764178 At round 13 training accuracy: 0.5922930905799869 At round 13 training loss: 1.0417891773572328 gradient difference: 15.32712477985914
And the same result happened when i run shakespeare. But mnist and nist performs good. how can i solve this? is there something wrong of stacked_lstm?