Reward doesn't go up - Githubissues

llSourcell / Reinforcement_Learning_for_Stock_Prediction

This is the code for "Reinforcement Learning for Stock Prediction" By Siraj Raval on Youtube

637 stars 362 forks source link

Reward doesn't go up #10

Open star277hk opened 6 years ago

star277hk commented 6 years ago

After each episode, I expect agent learnt something from previous, but I plot the reward of 1000 Episode, the reward doesn't go up. Is it mean the agent can't learn from the dataset ? I used EURUSD 2010 - 2015 total 1567 records with 3 window size

yeshengyi commented 6 years ago

Same here, I think the training process should filter out those with less than expected profits.

madytyoo commented 6 years ago

I would suggest to adjust the normalization function, the implementation computes the sigmoid of returns:

sigmoid(block[i + 1] - block[i])

The returns in the sample stock file range between +/-100. For currencies the returns range is between +/-0.001 if you are using a raw price data set. You could try to express the return in pips:

sigmoid( (block[i + 1] - block[i])*10000)

MonaTanggg commented 6 years ago

@madytyoo sorry tried, but the reward didn't go up after times 10000 times1000

madytyoo commented 6 years ago

@MonaTanggg I did a test with a EURUSD data 2009-2017, 1000 episodes logging the reward on Tensor board. I suggest to check your data set. screenshot from 2018-08-02 00-51-00

star277hk commented 6 years ago

@madytyoo did you edit anything from your source code/dataset , I wonder why I got a total different result, may I know your tensorflow version as well ? Not sure which part go wrong. Thanks in advance.

madytyoo commented 6 years ago

@star277hk I did the changes suggested by xtr33me (see: https://github.com/llSourcell/Reinforcement_Learning_for_Stock_Prediction/pull/8). In addition I multiplied the EURUSD data by 1000, in the getStockDataVec function.

vec.append(float(line.split(",")[4])*1000)

As I mentioned the currencies the returns range is between +/-0.001 is small to get a result. I run a test without multiplying the price by 1000 (see below) and I got a result similar to your chart. Cheers

screenshot from 2018-08-02 11-29-15

star277hk commented 6 years ago

@madytyoo yes I used xtr33me #8 added times 1000 at vec.append(float(line.split(",")[4])*1000) still got no luck. I used EURUSD ASK 01.07.2010-01.07.2015 dataframe 1 day and window size 50 with tensorflow 1.3. May I know your setting ?

madytyoo commented 6 years ago

I'm using EURUSD BID 01.01.2010-31.12.2017 1 day, window size 10, 1000 episodes, tensorflow 1.3.

colbyham commented 6 years ago

I added all columns of the dataset, high, low, volume, etc. to the input layer and i've been achieving max gain of ~60% over ten years for the GSPC dataset which is not that great but at least it's progress. Average drawdown of around 25-35%. I think adding actions to the agent that are related to how much of your portfolio you should hold, sell, or buy up to, etc would be helpful. In other words, bucketed ranges of sizes of trades based on portfolio and or buying power.

MonaTanggg commented 6 years ago

@madytyoo Strange, already using almost same setting but agent still can't learn. Are you just learning from the close price or added other values? If you don't mine, hope you can share me your project that can be raise reward, I wanna figure out where is the problem, thanks. test

xxdaggerxx commented 6 years ago

kind off retarded to post code that doesn't work