Closed theomarzaki closed 5 years ago
Log:
Changed Reward allocation (part of research)
removed RFC as a reward aid, provided much faster training times
changed IsTerminal for Model Learning to account extreme agent actions that resulted in ((-)inf,nan)
Save the rewards and loss over time for the models in a text file for further extrapolation
Log:
Changed Reward allocation (part of research)
removed RFC as a reward aid, provided much faster training times
changed IsTerminal for Model Learning to account extreme agent actions that resulted in ((-)inf,nan)
Save the rewards and loss over time for the models in a text file for further extrapolation