Chapter 22_deep_reinforcement_learning Google Colab Python 3.10 #317

Closed martin0 closed 5 months ago

martin0 commented 5 months ago

Describe the bug A brief description of the bug and in which notebook/script it lives. 04_q_learning_for_trading Train Agent DDQNAgent.experience_replay() q_values[[self.idx, actions]] = targets ValueError: shape mismatch: value array of shape (4096,) could not be broadcast to indexing result of shape (2,4096,3)

Output is (approximately 35 steps) 10 | 00:00:03 | Agent: -38.6% (-38.6%) | Market: 4.6% ( 4.6%) | Wins: 20.0% | eps: 0.960 (approximately 77 steps) `ValueError Traceback (most recent call last) in <cell line: 3>() 13 0.0 if done else 1.0) 14 if ddqn.train: ---> 15 ddqn.experience_replay() 16 if done: 17 break

in experience_replay(self) 107 108 q_values = self.online_network.predict_on_batch(states) --> 109 q_values[[self.idx, actions]] = targets 110 111 loss = self.online_network.train_on_batch(x=states, y=q_values)

ValueError: shape mismatch: value array of shape (4096,) could not be broadcast to indexing result of shape (2,4096,3)`

Running on Google Colab (where session restarts wipe out installed talib package) I am attaching the modified notebook, talib wheel and underlying talib libraries. I installed the talib package using instructions from another notebook "Install Ta-lib on Google colab" (also attached, with modifications to save pieces so a full rebuild is not necessary each new session)

For the extra things, I have a python folder and a data folder in the root of my Google Drive.

The python folder also includes the gym environments specifically the one related to this notebook directory python/gymenvs/machine_learning_for_trading should hold the contents of this zip file

Create necessary data as per instructions chapter 2 of the book Run the notebook

Expected behavior A clear and concise description of what you expected to happen. Expect the 04_q_learning_from_trading notebook to run as designed by the author (please)

Environment If you are not using the latest version of the Docker imag: Google Colab using T4 GPU

Additional context Add any other context about the problem here. Given the mod I had to make for observation_space, here is a runtime value of the observation_space Box( [ -0.5186916 -13.186786 -9.157841 -6.9791217 -5.2897873 -1.5290436 -5.4077215 -0.6155895 -2.762308 -3.9641087], [ 0.3321519 11.431712 10.235379 9.135829 8.238228 1.4996951 5.7050333 5.4152718 2.7126348 2.7631414], (10,), float32)

I will continue to troubleshoot the problem, and post further updates I find. Happy to answer any questions to clarify this post.

martin0 commented 5 months ago

Looking a bit deeper (no pun intended). When creating the environment, I got a warning about which perhaps is relevant. Cell trading_environment = gym.make('trading-v0', ticker='AAPL', max_episode_steps=trading_days, trading_days=trading_days, trading_cost_bps=trading_cost_bps, time_cost_bps=time_cost_bps) trading_environment.seed(42) Output `INFO:gymenvs.machine_learning_for_trading.trading_env:gymenvs.machine_learning_for_trading.trading_env logger started. INFO:gymenvs.machine_learning_for_trading.trading_env:loading data for AAPL... INFO:gymenvs.machine_learning_for_trading.trading_env:got data for AAPL... INFO:gymenvs.machine_learning_for_trading.trading_env:None <class 'pandas.core.frame.DataFrame'> MultiIndex: 9367 entries, (Timestamp('1981-01-30 00:00:00'), 'AAPL') to (Timestamp('2018-03-27 00:00:00'), 'AAPL') Data columns (total 10 columns):

Column Non-Null Count Dtype

0 returns 9367 non-null float64 1 ret_2 9367 non-null float64 2 ret_5 9367 non-null float64 3 ret_10 9367 non-null float64 4 ret_21 9367 non-null float64 5 rsi 9367 non-null float64 6 macd 9367 non-null float64 7 atr 9367 non-null float64 8 stoch 9367 non-null float64 9 ultosc 9367 non-null float64 dtypes: float64(10) memory usage: 1.5+ MB /usr/local/lib/python3.10/dist-packages/gym/ DeprecationWarning: WARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set new_step_api=True to use new step API. This will be the default behaviour in future. deprecation( /usr/local/lib/python3.10/dist-packages/gym/wrappers/ DeprecationWarning: WARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set new_step_api=True to use new step API. This will be the default behaviour in future. deprecation( [42]`

martin0 commented 5 months ago

piplist.txt Listing from command Google Colab command !pip list

Some debug information Some reason, I can't add new comment without "Close with comment"??

Some debug info just before the line

q_values[[self.idx, actions]] = targets

ipdb> len(rewards) 4096 ipdb> len(not_done) 4096 ipdb> self.gamma (0.99,) ipdb> target_q_values <tf.Tensor: shape=(4096,), dtype=float32, numpy= array([ 0.05242079, -0.00775741, -0.23272878, ..., 0.01323538, -0.13362771, -0.28871116], dtype=float32)> ipdb> states.shape (4096, 10) ipdb> q_values.shape (4096, 3) ipdb> actions array([0, 0, 2, ..., 0, 1, 1]) ipdb> len(actions) 4096 ipdb> self.idx <tf.Tensor: shape=(4096,), dtype=int32, numpy=array([ 0, 1, 2, ..., 4093, 4094, 4095], dtype=int32)>

Thought I would try the preceeding notebook, but it has this comment...

See the notebook 04_q_learning_for_trading.ipynb for instructions on upgrading TensorFlow to version 2.2, required by the code below.. I see no information about upgrading to Tensorflow version 2.2. Is that the problem?

martin0 commented 5 months ago

Here is a link to a gpt4 conversation I had about the issue. I haven't tried the suggestion yet.

(bard said it was beyond its capabilities at present, even though Google demonstrated competitive coding with gemini last month)

martin0 commented 5 months ago

Following ChatGPT suggestion 1 (see above)

the following mods to experience_replay() seems to result in the ability to run the model :-)

    # Reshape targets to be a 2D array with the same second dimension as q_values
    targets_np = targets.numpy().reshape(-1, 1) 
    # Create a mask for the actions taken
    mask = tf.one_hot(actions, self.num_actions)
    # Update the q_values for the actions taken
    q_values = q_values * (1 - mask) + targets_np * mask