stefan-jansen / machine-learning-for-trading

Code for Machine Learning for Algorithmic Trading, 2nd edition.
https://ml4trading.io
12.27k stars 3.99k forks source link

Chapter 22_deep_reinforcement_learning Google Colab Python 3.10 #317

Closed martin0 closed 5 months ago

martin0 commented 5 months ago

Describe the bug A brief description of the bug and in which notebook/script it lives. 04_q_learning_for_trading Train Agent DDQNAgent.experience_replay() q_values[[self.idx, actions]] = targets ValueError: shape mismatch: value array of shape (4096,) could not be broadcast to indexing result of shape (2,4096,3)

Output is (approximately 35 steps) 10 | 00:00:03 | Agent: -38.6% (-38.6%) | Market: 4.6% ( 4.6%) | Wins: 20.0% | eps: 0.960 (approximately 77 steps) `ValueError Traceback (most recent call last) in <cell line: 3>() 13 0.0 if done else 1.0) 14 if ddqn.train: ---> 15 ddqn.experience_replay() 16 if done: 17 break

in experience_replay(self) 107 108 q_values = self.online_network.predict_on_batch(states) --> 109 q_values[[self.idx, actions]] = targets 110 111 loss = self.online_network.train_on_batch(x=states, y=q_values)

ValueError: shape mismatch: value array of shape (4096,) could not be broadcast to indexing result of shape (2,4096,3)`

To Reproduce

Running on Google Colab (where session restarts wipe out installed talib package) I am attaching the modified notebook, talib wheel and underlying talib libraries. I installed the talib package using instructions from another notebook "Install Ta-lib on Google colab" (also attached, with modifications to save pieces so a full rebuild is not necessary each new session) Copy of Install Ta-lib on Google colab.zip

For the extra things, I have a python folder and a data folder in the root of my Google Drive. talib_wheel_and_lib_and_config.zip 04_q_learning_for_trading.zip

The python folder also includes the gym environments specifically the one related to this notebook directory python/gymenvs/machine_learning_for_trading should hold the contents of this zip file trading_env.zip

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Create necessary data as per instructions chapter 2 of the book Run the notebook

Expected behavior A clear and concise description of what you expected to happen. Expect the 04_q_learning_from_trading notebook to run as designed by the author (please)

Screenshots If applicable, add screenshots to help explain your problem.

Environment If you are not using the latest version of the Docker imag: Google Colab using T4 GPU

Additional context Add any other context about the problem here. Given the mod I had to make for observation_space, here is a runtime value of the observation_space Box( [ -0.5186916 -13.186786 -9.157841 -6.9791217 -5.2897873 -1.5290436 -5.4077215 -0.6155895 -2.762308 -3.9641087], [ 0.3321519 11.431712 10.235379 9.135829 8.238228 1.4996951 5.7050333 5.4152718 2.7126348 2.7631414], (10,), float32)

I will continue to troubleshoot the problem, and post further updates I find. Happy to answer any questions to clarify this post.

Thanks Martin

martin0 commented 5 months ago

Looking a bit deeper (no pun intended). When creating the environment, I got a warning about which perhaps is relevant. Cell trading_environment = gym.make('trading-v0', ticker='AAPL', max_episode_steps=trading_days, trading_days=trading_days, trading_cost_bps=trading_cost_bps, time_cost_bps=time_cost_bps) trading_environment.seed(42) Output `INFO:gymenvs.machine_learning_for_trading.trading_env:gymenvs.machine_learning_for_trading.trading_env logger started. INFO:gymenvs.machine_learning_for_trading.trading_env:loading data for AAPL... INFO:gymenvs.machine_learning_for_trading.trading_env:got data for AAPL... INFO:gymenvs.machine_learning_for_trading.trading_env:None <class 'pandas.core.frame.DataFrame'> MultiIndex: 9367 entries, (Timestamp('1981-01-30 00:00:00'), 'AAPL') to (Timestamp('2018-03-27 00:00:00'), 'AAPL') Data columns (total 10 columns):

Column Non-Null Count Dtype


0 returns 9367 non-null float64 1 ret_2 9367 non-null float64 2 ret_5 9367 non-null float64 3 ret_10 9367 non-null float64 4 ret_21 9367 non-null float64 5 rsi 9367 non-null float64 6 macd 9367 non-null float64 7 atr 9367 non-null float64 8 stoch 9367 non-null float64 9 ultosc 9367 non-null float64 dtypes: float64(10) memory usage: 1.5+ MB /usr/local/lib/python3.10/dist-packages/gym/core.py:317: DeprecationWarning: WARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set new_step_api=True to use new step API. This will be the default behaviour in future. deprecation( /usr/local/lib/python3.10/dist-packages/gym/wrappers/step_api_compatibility.py:39: DeprecationWarning: WARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set new_step_api=True to use new step API. This will be the default behaviour in future. deprecation( [42]`

martin0 commented 5 months ago

piplist.txt Listing from command Google Colab command !pip list

Some debug information Some reason, I can't add new comment without "Close with comment"??

Some debug info just before the line

q_values[[self.idx, actions]] = targets

ipdb> len(rewards) 4096 ipdb> len(not_done) 4096 ipdb> self.gamma (0.99,) ipdb> target_q_values <tf.Tensor: shape=(4096,), dtype=float32, numpy= array([ 0.05242079, -0.00775741, -0.23272878, ..., 0.01323538, -0.13362771, -0.28871116], dtype=float32)> ipdb> states.shape (4096, 10) ipdb> q_values.shape (4096, 3) ipdb> actions array([0, 0, 2, ..., 0, 1, 1]) ipdb> len(actions) 4096 ipdb> self.idx <tf.Tensor: shape=(4096,), dtype=int32, numpy=array([ 0, 1, 2, ..., 4093, 4094, 4095], dtype=int32)>

Thought I would try the preceeding notebook, but it has this comment...

See the notebook 04_q_learning_for_trading.ipynb for instructions on upgrading TensorFlow to version 2.2, required by the code below.. I see no information about upgrading to Tensorflow version 2.2. Is that the problem?

martin0 commented 5 months ago

Here is a link to a gpt4 conversation I had about the issue. I haven't tried the suggestion yet. https://chat.openai.com/share/5469961d-7b58-4f8e-a176-a937ae132a3f

(bard said it was beyond its capabilities at present, even though Google demonstrated competitive coding with gemini last month)

martin0 commented 5 months ago

Following ChatGPT suggestion 1 (see above)

the following mods to experience_replay() seems to result in the ability to run the model :-)

    # Reshape targets to be a 2D array with the same second dimension as q_values
    targets_np = targets.numpy().reshape(-1, 1) 
    # Create a mask for the actions taken
    mask = tf.one_hot(actions, self.num_actions)
    # Update the q_values for the actions taken
    q_values = q_values * (1 - mask) + targets_np * mask