Closed wxue24 closed 11 months ago
After adding preprocessing to the data and adjusting the hyperparameters I got a much better reward plot.
We can try to make more adjustments to decrease the training time but it seems like the result is pretty good as the end reward is > 100.
The reward plot as shown above is decreasing rather than increasing over time. Could be due to hyperparameters chosen, or how the state features are preprocessed.
Some ideas to try: