Closed CUN-bjy closed 3 years ago
@benthebear93 ready to merge! check and feedback plz. and it's time to start experiments after the next updating(add logger).
Now we have some problems. and this interferes with the convergence of models.
critic value is so huge. maybe we need to normalize this.
sometimes, env gives us a zero reward(this means perfectly good performance)
Episode: 15 Reward: -400.0
Episode: 16 Reward: 0.0
Episode: 17 Reward: -400.0
Episode: 18 Reward: -398.0
Episode: 19 Reward: -400.0
Episode: 20 Reward: -400.0
Episode: 21 Reward: -400.0
Episode: 22 Reward: -400.0
Episode: 23 Reward: -400.0
Episode: 24 Reward: -400.0
Episode: 25 Reward: -400.0
Episode: 26 Reward: -400.0
Episode: 27 Reward: -400.0
Episode: 28 Reward: -400.0
Episode: 29 Reward: -400.0
Episode: 30 Reward: -400.0
Episode: 31 Reward: -400.0
Episode: 32 Reward: -400.0
Episode: 33 Reward: 0.0
Episode: 34 Reward: -400.0
Episode: 35 Reward: -400.0
Episode: 36 Reward: -400.0
something wrong, we need to see this more.
don't merge yet..!
This is why we need to apply normalization of inputs.
@benthebear93 Could you make an experiment for now version of models? I don't have enough time to develop these days because of my personal mission. I'll develop more after achieving those! And please merge it after experiments.
@CUN-bjy i will do it today!
@CUN-bjy Something is definitely wrong. but i will merge it now.
T.T
note-keeping(this is why we have to use normalized inputs. but actually the results are not just driven because of this.) https://nhigham.com/2020/08/04/what-is-numerical-stability/
Description
Feature
Checklist
make format