nextgrid / deep-learning-labs-openAI

Deep Learning Labs by Nextgrid
https://nextgrid.ai/dll
12 stars 1 forks source link

BipedalWalkerHardcore-v3 #24

Open Mindgames opened 3 years ago

Mindgames commented 3 years ago

Mindgames commented 3 years ago

SAC (Num steps not relevant as it resets every 12h 😅, Episodes is tracked correctly)

Episodes: 14231
Num steps: 20000
Mean Episode reward: -66.00 
Last Episode reward: -35.75 
=========== NEXTGRID.AI ================
Episodes: 14242
Num steps: 40000
Mean Episode reward: 85.00 
Last Episode reward: 260.79 
=========== NEXTGRID.AI ================
Episodes: 14252
Num steps: 60000
Mean Episode reward: -72.00 
Last Episode reward: -104.83 
=========== NEXTGRID.AI ================
Episodes: 14262
Num steps: 80000
Mean Episode reward: 88.00 
Last Episode reward: 177.13 
=========== NEXTGRID.AI ================
Episodes: 14274
Num steps: 100000
Mean Episode reward: 40.00 
Last Episode reward: 74.45 
=========== NEXTGRID.AI ================
Mindgames commented 3 years ago

Discontinue SAC attempt 1 agent training at episode: 18227. Have not been able to track any significant improvement over the last 8000 Episodes.

Saved model: SACEP18227.zip

Episodes: 18180
Num steps: 20000
Mean Episode reward: -3.00 
Last Episode reward: -27.17 
=========== NEXTGRID.AI ================
Episodes: 18190
Num steps: 40000
Mean Episode reward: 14.00 
Last Episode reward: -2.79 
=========== NEXTGRID.AI ================
Episodes: 18201
Num steps: 60000
Mean Episode reward: -7.00 
Last Episode reward: -82.99 
=========== NEXTGRID.AI ================
Episodes: 18211
Num steps: 80000
Mean Episode reward: -47.00 
Last Episode reward: -55.45 
=========== NEXTGRID.AI ================
Episodes: 18227
Num steps: 100000
Mean Episode reward: -29.00 
Last Episode reward: 151.57 
=========== NEXTGRID.AI ================

Evaluation

Score over 100 episodes [187.9042206593894, -32.469026104615324, -70.81650983082687, -49.108647021474994, 6.6287390809141185, -74.71416641923848, -50.45687534247784, 88.28404775641464, 0.014338045673426336, 17.969497308955905, -66.81715899625112, -28.691996491327945, -139.02874316012532, 26.495699529005815, -36.35097857781624, -74.5204739946293, 167.36019906158245, -26.179938038739667, -22.289288508718233, 15.167179324137678, 1.8909846697729051, 182.17512128805345, 34.72523689738251, 244.32722957669466, -33.98489109002672, -40.851483853890215, -64.54334423266705, -66.58005670726794, -146.2820425079165, -13.605059392333018, -73.83638758936961, -159.31869210409997, 29.742366938312703, 140.61978609780894, 33.622395741254174, 44.58727985049982, 53.83144635493758, 36.281755764794944, -89.79591975389484, -28.942284998010805, -36.025324987541275, -26.85681127047104, -32.10085688938568, -75.81729754111244, -23.58202656382071, -50.352805557642704, 137.85517890763705, 96.88171925518286, 83.43461953848825, -130.32948564565984, 240.74941018095427, 23.99701260345003, -61.8453168429546, -81.34782597544356, 76.62625033251996, 24.231235143893777, -23.892761363203334, -21.738343958033894, 118.21414517715915, -13.19753073484508, -25.766785273402323, -43.5121859099138, -36.37378502375948, -1.9337168512219813, -32.82329384288326, -60.88687292209774, 239.5053564478821, -39.23882463333816, -64.18489878883399, 238.25595120838062, 59.92245905298061, 28.687137087715314, 2.874094137204106, -78.74272753831988, -125.59422473928531, -57.09273500979067, -40.63004329261054, -49.30630773666518, 17.88840674762548, -70.60731948009933, 139.30214062170873, -132.428558565979, -36.0284022168348, 234.0635939055876, -0.9634438812198596, -28.879775302508282, -56.77161059009773, -54.40022465335941, 18.298268459767932, -64.97590637592022, -68.91563135885765, 151.03792318596598, 109.0681308254351, -36.0391907596278, -56.60008994490157, -58.21921656397303, 35.84351538844013, -136.1905728013061, -73.37861972183478, -113.07140942185774]
Mindgames commented 3 years ago

TD3

Similar curve as seen with SAC

Episodes: 14781
Num steps: 20000
Mean Episode reward: -150.00 
Last Episode reward: -126.42 
=========== NEXTGRID.AI ================
check if it shall save
Episodes: 14784
Num steps: 25000
Mean Episode reward: -142.00 
Last Episode reward: -142.18 

image

github-actions[bot] commented 3 years ago

Stale issue message