If I understand the code correctly, there is one bug:
Line 123 of the code trading_env.py:
self.posns[self.step] = action - 1
self.trades[self.step] = self.posns[self.step] - bod_posn
Should be
self.posns[self.step] = bod_posn + action - 1
self.trades[self.step] = action - 1
Does the variable self.trade represent the trades, and posns represent the position after the trades are executed?
If I understand the code correctly, there is one bug: Line 123 of the code trading_env.py: self.posns[self.step] = action - 1
self.trades[self.step] = self.posns[self.step] - bod_posn
Should be self.posns[self.step] = bod_posn + action - 1 self.trades[self.step] = action - 1
Does the variable self.trade represent the trades, and posns represent the position after the trades are executed?