Closed lucasfbn closed 3 years ago
Ideas:
We will not align the training and test environment. Those shall be treated as two distinct problems. We, therefore, focus on optimizing for the training env first, and after we optimize for the test environment. This means we are first solving the trading problem for each individual stock and after we focus on solving the problem with trading with fixed portfolio constraints (e.g. budget etc.).
In this context, it is also expected to learn some more specific RL algorithms. It makes no sense to look into PPO any deeper at the moment since it's unclear whether this algorithm provides the best results. Therefore, first read through papers and then come back to the actual implementation.
Closes #83 and closes #88.