lucasfbn / Reddit-Sentiment-Reinforcement-Learning

Stock trading using Reddit sentiment data and reinforcement learning.
0 stars 1 forks source link

Regarding continuous evaluation reward improvements and high probabilities of the agents. #83

Closed lucasfbn closed 3 years ago

lucasfbn commented 3 years ago

Combined issue for #81 and #72

lucasfbn commented 3 years ago

Key findings 17-08-21

Regarding the following experiments:

Whether the evaluation reward1 is (more or less) constant

1) when training the same agent multiple times on 1 episode

2) when training the same agent multiple times on 1 episode and we do not shuffle the sequences

3) when training the same agent multiple times on 1 episode and apply an exploration rate of 0.02

1.)

No, the evaluation reward fluctuates a lot. Although the train reward stays more or less constant. E.g. it's most likely not due to a local minima.

Relevant experiment id: Exp: Constant Reward 1 Episode

2.)

Strangely, the agent isn't able to learn anything.

Relevant experiment id: Exp: Constant Reward 1 Episode, no shuffle

3.)

As in 1.), the evaluation reward still fluctuates a lot. However, the resulting probabilities lower compared to an agent without exploration on (compare the eval_probability_stats.csv of the runs in the experiment id of 1.) and the ones in the experiment id of 3.)) Relevant experiment id: Exp: Constant Reward 1 Episode, exploration on

1: Evaluation reward refers to the reward from the evaluation framework and not from the environment the agent is trained with.

Regarding the differences between high evaluation reward agents and low ev. r. agents

Examined on experiment 1).

There seems to be no obvious difference between the two

but

We might, therefore, conclude, that the changes are due to some learned policies being more beneficial to the evaluation reward than others.

Therefore, the following steps are proposed:

Open questions

lucasfbn commented 3 years ago

Moved to #92.