stanfordnmbl / osim-rl

Reinforcement learning environments with musculoskeletal models
http://osim-rl.stanford.edu/
MIT License
882 stars 249 forks source link

Random seed doesn't change the state #173

Closed whikwon closed 5 years ago

whikwon commented 5 years ago

I'm wondering for the Round 1, does random seed change the state little bit?

I can find no difference in state when random seeds be changed.

Otherwise, Is submission client using same random seed in library? I tried to submit but I got really bad result compared to my local computer's result.

kidzik commented 5 years ago

Random seed does not change anything in the first Round. However, we've just deployed the second round on the server, so maybe you are submitting to the Round 2 the agent designed for Round 1?

whikwon commented 5 years ago

Thanks. I've checked and submitted to the Round 1.

Is there any difference between server's and local's environment? or are they same exactly?

whikwon commented 5 years ago

Client reset observation: array([ 8.47317771e+01, -4.93364080e+01, -5.21290263e-01, 1.62475611e+01, -1.27453941e+00, -8.26986568e-01, 1.62511158e+01, -1.81215993e+00, -8.26986568e-01, -7.44823957e+00, -9.32238490e-01, 1.41166687e+00, 1.39971545e+01, 8.65567236e-01, -6.15696762e-01, 1.31893442e+01, 1.20657638e-01, 1.82690819e-01, -4.90706417e+01, 1.20657638e-01, -3.42992410e-01, 4.13565170e+01, 1.09353872e+00, -5.37051607e-01, -5.51919897e+01, 1.09353872e+00, -6.87969170e-01, 8.67997126e+01, 1.35538699e+02, -5.24394215e-01, 1.12202559e+01, -2.56552092e+00, -3.51181594e-01, -3.50971077e-01, 2.12857612e-02, 1.03397923e+03, -3.50971077e-01, 2.12857612e-02, -1.74787832e+02, -1.22251914e+00, 2.12857612e-02, -1.60670860e+02, 3.21928456e+00, 2.12857612e-02, 3.40723749e+01, 3.21928456e+00, 2.12857612e-02, 3.40723749e+01, -1.22251914e+00, 2.12857612e-02, 9.78115369e+03, -1.22251914e+00, 2.12857612e-02, 1.44790665e+02, -3.50971077e-01, 2.12857612e-02, 1.03397923e+03, -3.50971077e-01, 2.12857612e-02, 2.24531411e+02, -3.50971077e-01, 2.12857612e-02, 1.03397923e+03, 3.21928456e+00, 2.12857612e-02, 3.40723749e+01, -1.23969857e-01, 6.12930355e-03, -9.14200000e-02, -7.07000000e-02, 8.73900000e-01, -8.35000000e-02, -7.07000000e-02, 8.73900000e-01, 8.35000000e-02, -5.27643210e-02, 1.56940708e+00, 0.00000000e+00, 0.00000000e+00, 9.40000000e-01, 0.00000000e+00, -7.51998565e-02, 4.80793036e-02, 8.35000000e-02, -7.51998565e-02, 4.78079304e-01, 8.35000000e-02, -7.51998565e-02, 4.80793036e-02, -8.35000000e-02, -7.51998565e-02, 4.78079304e-01, -8.35000000e-02, 5.48301435e-02, 4.12930355e-03, -9.25000000e-02, -1.00700000e-01, 1.02150000e+00, 0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -8.72665000e-02, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -8.72665000e-02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, 0.00000000e+00, 2.19661393e+02, 2.19661393e+02, 1.44874331e+02, 1.44874331e+02, -0.00000000e+00, 4.27288112e+01, 4.27288112e+01, -1.76155929e-12, -5.04530634e+02, 0.00000000e+00, -4.64591558e+01, 1.62211275e-13, -4.94314807e+00, 6.78666028e-13, 1.94377676e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 5.83133027e+00, 1.08289327e-12, 3.10152958e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 6.20305916e+00, 1.08289327e-12, 3.10152958e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 6.20305916e+00, 2.73017833e+02, 1.71787351e+02, 1.71787351e+02, 1.94300305e+02, 1.94300305e+02, 1.58012080e+02, 1.58012080e+02, -1.35733206e-12, -3.88755351e+02, 0.00000000e+00, 3.24610718e+01, -1.13337227e-13, 3.72616426e+00, 1.35733206e-12, 3.88755351e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 2.55081824e+01, 1.35733206e-12, 3.88755351e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 2.55081824e+01, 9.90329705e+01, 9.90329705e+01, 3.70005995e+02, 1.04050600e+02, 4.36793884e+02, 4.36793884e+02, 8.09447818e+02, 9.63636303e+03, -2.30926389e-14, 3.40723749e+01, 3.21928456e+00, 2.12857612e-02, 1.39971545e+01, 8.65567236e-01, -6.15696762e-01, -2.08860207e+02, 3.57025564e+00, -4.25215418e-14, -1.94743235e+02, -4.44180370e+00, 1.59317004e-14, 3.99319243e+02, 3.05461525e+02, 0.00000000e+00, 0.00000000e+00, -8.72665000e-02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 9.40000000e-01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 4.40087990e-14, 2.57083221e+00, -1.02376453e-15, -8.46656556e-02, 9.95273057e-01, -3.57608745e-03, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 5.00000000e-02, 2.19661393e+02, 7.75230686e-02, 1.17001569e-13, 5.00000000e-02, 2.19661393e+02, 7.75230686e-02, 1.17001569e-13, 5.00000000e-02, 1.46257689e+02, 5.52613759e-02, 5.53125793e-11, 5.00000000e-02, 1.46257689e+02, 5.52613759e-02, 5.53125793e-11, 5.00000000e-02, 4.50991920e+01, 1.34342647e-01, 9.54219898e-17, 5.00000000e-02, 4.50991920e+01, 1.34342647e-01, 9.54219898e-17, 5.00000000e-02, 2.82794565e+02, 5.72025767e-02, 5.71894964e-14, 5.00000000e-02, 1.71787351e+02, 1.60848247e-01, 1.01819825e-12, 5.00000000e-02, 1.71787351e+02, 1.60848247e-01, 1.01819825e-12, 5.00000000e-02, 2.02456271e+02, 6.35589621e-02, 2.10562614e-14, 5.00000000e-02, 2.02456271e+02, 6.35589621e-02, 2.10562614e-14, 5.00000000e-02, 1.59265260e+02, 1.30057686e-01, 3.34718350e-11, 5.00000000e-02, 1.59265260e+02, 1.30057686e-01, 3.34718350e-11, 5.00000000e-02, 9.96365248e+01, 6.02704462e-02, 2.23620250e-15, 5.00000000e-02, 9.96365248e+01, 6.02704462e-02, 2.23620250e-15, 5.00000000e-02, 4.06416137e+02, 4.49481412e-02, 3.41206438e-10, 5.00000000e-02, 1.04515869e+02, 6.28800098e-02, 8.52564214e-14, 5.00000000e-02, 4.37738569e+02, 7.89075687e-02, 6.16823383e-15, 5.00000000e-02, 4.37738569e+02, 7.89075687e-02, 6.16823383e-15])

Local env reset observation: array([ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 9.40000000e-01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -8.72665000e-02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 3.40723749e+01, 3.21928456e+00, 2.12857612e-02, 1.39971545e+01, 8.65567236e-01, -6.15696762e-01, -1.94743235e+02, -4.44180370e+00, 1.60427227e-14, 3.05461525e+02, 9.63636303e+03, -2.08860207e+02, 3.57025564e+00, -4.26325641e-14, 3.99319243e+02, 8.09447818e+02, -2.30926389e-14, 0.00000000e+00, 9.40000000e-01, 0.00000000e+00, -7.07000000e-02, 8.73900000e-01, 8.35000000e-02, -7.51998565e-02, 4.78079304e-01, 8.35000000e-02, -7.51998565e-02, 4.80793036e-02, 8.35000000e-02, -7.07000000e-02, 8.73900000e-01, -8.35000000e-02, -7.51998565e-02, 4.78079304e-01, -8.35000000e-02, -7.51998565e-02, 4.80793036e-02, -8.35000000e-02, -1.23969857e-01, 6.12930355e-03, -9.14200000e-02, 5.48301435e-02, 4.12930355e-03, -9.25000000e-02, -1.00700000e-01, 1.02150000e+00, 0.00000000e+00, -5.27643210e-02, 1.56940708e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.39971545e+01, 8.65567236e-01, -6.15696762e-01, 1.62511158e+01, -1.81215993e+00, -8.26986568e-01, -4.90706417e+01, 1.20657638e-01, -3.42992410e-01, 1.31893442e+01, 1.20657638e-01, 1.82690819e-01, 1.62475611e+01, -1.27453941e+00, -8.26986568e-01, -5.51919897e+01, 1.09353872e+00, -6.87969170e-01, 4.13565170e+01, 1.09353872e+00, -5.37051607e-01, 8.47317771e+01, -4.93364080e+01, -5.21290263e-01, 8.67997126e+01, 1.35538699e+02, -5.24394215e-01, 1.12202559e+01, -2.56552092e+00, -3.51181594e-01, -7.44823957e+00, -9.32238490e-01, 1.41166687e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -8.72665000e-02, -0.00000000e+00, 0.00000000e+00, -8.72665000e-02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 3.21928456e+00, 2.12857612e-02, 3.40723749e+01, -1.22251914e+00, 2.12857612e-02, -1.60670860e+02, -1.22251914e+00, 2.12857612e-02, 1.44790665e+02, -1.22251914e+00, 2.12857612e-02, 9.78115369e+03, -3.50971077e-01, 2.12857612e-02, -1.74787832e+02, -3.50971077e-01, 2.12857612e-02, 2.24531411e+02, -3.50971077e-01, 2.12857612e-02, 1.03397923e+03, -3.50971077e-01, 2.12857612e-02, 1.03397923e+03, -3.50971077e-01, 2.12857612e-02, 1.03397923e+03, 3.21928456e+00, 2.12857612e-02, 3.40723749e+01, 3.21928456e+00, 2.12857612e-02, 3.40723749e+01, 2.19661393e+02, 1.44874331e+02, 1.94300305e+02, 4.27288112e+01, 1.71787351e+02, 1.58012080e+02, 9.90329705e+01, 4.36793884e+02, 2.19661393e+02, 1.44874331e+02, 1.94300305e+02, 4.27288112e+01, 1.71787351e+02, 1.58012080e+02, 9.90329705e+01, 4.36793884e+02, 2.73017833e+02, 3.70005995e+02, 1.04050600e+02, -0.00000000e+00, -1.35733206e-12, -3.88755351e+02, 0.00000000e+00, 3.24610718e+01, -1.13337227e-13, 3.72616426e+00, 1.35733206e-12, 3.88755351e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 2.55081824e+01, 1.35733206e-12, 3.88755351e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 2.55081824e+01, -1.76155929e-12, -5.04530634e+02, 0.00000000e+00, -4.64591558e+01, 1.62211275e-13, -4.94314807e+00, 6.78666028e-13, 1.94377676e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 5.83133027e+00, 1.08289327e-12, 3.10152958e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 6.20305916e+00, 1.08289327e-12, 3.10152958e+02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 6.20305916e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -0.00000000e+00, 0.00000000e+00, -0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 5.00000000e-02, 7.75230686e-02, 1.17001569e-13, 2.19661393e+02, 5.00000000e-02, 5.52613759e-02, 5.53125793e-11, 1.46257689e+02, 5.00000000e-02, 6.35589621e-02, 2.10562614e-14, 2.02456271e+02, 5.00000000e-02, 1.34342647e-01, 9.54219898e-17, 4.50991920e+01, 5.00000000e-02, 1.60848247e-01, 1.01819825e-12, 1.71787351e+02, 5.00000000e-02, 1.30057686e-01, 3.34718350e-11, 1.59265260e+02, 5.00000000e-02, 6.02704462e-02, 2.23620250e-15, 9.96365248e+01, 5.00000000e-02, 7.89075687e-02, 6.16823383e-15, 4.37738569e+02, 5.00000000e-02, 7.75230686e-02, 1.17001569e-13, 2.19661393e+02, 5.00000000e-02, 5.52613759e-02, 5.53125793e-11, 1.46257689e+02, 5.00000000e-02, 6.35589621e-02, 2.10562614e-14, 2.02456271e+02, 5.00000000e-02, 1.34342647e-01, 9.54219898e-17, 4.50991920e+01, 5.00000000e-02, 1.60848247e-01, 1.01819825e-12, 1.71787351e+02, 5.00000000e-02, 1.30057686e-01, 3.34718350e-11, 1.59265260e+02, 5.00000000e-02, 6.02704462e-02, 2.23620250e-15, 9.96365248e+01, 5.00000000e-02, 7.89075687e-02, 6.16823383e-15, 4.37738569e+02, 5.00000000e-02, 5.72025767e-02, 5.71894964e-14, 2.82794565e+02, 5.00000000e-02, 4.49481412e-02, 3.41206438e-10, 4.06416137e+02, 5.00000000e-02, 6.28800098e-02, 8.52564214e-14, 1.04515869e+02, -8.46656556e-02, 9.95273057e-01, -3.57608745e-03, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 4.40087990e-14, 2.57083221e+00, -9.88190446e-16])

whikwon commented 5 years ago

As I compared between client and local environment, I found some of observation values differs.

During whole training, some values had little tiny values in local, but in client they are not.

tiny_val_idx = np.where(np.abs(rolling_mean) < 1e-8)

print(tiny_val_idx)
>>> (array([ 8, 13, 25, 30, 33, 42, 47, 50]),)

print(client_reset_obs[tiny_val_idx])
>>> array([-8.26986568e-01,  8.65567236e-01,  1.09353872e+00,  1.12202559e+01,
       -3.50971077e-01,  3.21928456e+00,  3.40723749e+01,  9.78115369e+03])

print(local_reset_obs[tiny_val_idx])
>>> array([ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  1.60427227e-14, -4.26325641e-14, -2.30926389e-14])
whikwon commented 5 years ago

Sorry, the order of flattened vector was different.

kidzik commented 5 years ago

Thanks for reporting this!