Closed wallscheid closed 1 year ago
debugged it and did not found a systematic error so far.
@janstenner could you check next week if it is a learning problem? I changed the HPs (especially the action noise) which had quite an influence.
However, for m e it took up to 1000 episodes (depending on the random initialisation) till the agent learned to not crash a limit, but from there on, no enhancements were possible so far....
For the RL_Classic_Controller_Merge_Example notebook that was similar - but in the end the agent learned to fullfill the task (10 A 3-phase sine-current)...
@MarvinMeyer can give (e-)technical support if needed
I changed the agent to the default external RL agent by RL.jl...: https://github.com/upb-lea/JuliaElectricGrid.jl/blob/develop/examples/scripts/RL_Complex_DEMO_external_agent.jl
resulting into following learning curve:
Using the internal agent and changing the HPs to the same used above did not result in a similar curve...
(Changes in agetn_ddpg.jl -> HPs and multi_controller.jl -> not scale down the initial weigths, forgot something?)
Did I mixed something up? Had in mind that it was fine...
Dirty workaround in fresh console:
I tried to change to order of imports (using...= but it did not solve the problem.
Even after 10k+ episodes of training, avg. reward is close to zero and resulting simulated states during test phase are far away from specified references. From a first look on the code setting, I was not able to find an obvious bug. However, current scenario result is far from satisfying.