[ ] ‘listing6_1_dqn_tensorflow.ipynb’ file in the assignment (From the book “Nimish Sanghi, Deep Reinforcement Learning with Python With PyTorch, TensorFlow and OpenAI Gym”)
[x] Install required packages in the sample code.
[x] [10pts] Run the program and check whether it works correctly.
1-2. Sample Code Using PyTorch
If you are familiar with PyTorch, you can use PyTorch program code.
[ ] You will use the PyTorch library. To get started, follow the instructions to install PyTorch and then, go through a tutorial about the basics of PyTorch. Refer to https://pytorch.org/
[ ] ‘listing6_1_dqn_pytorch.ipynb’ file in the assignment (From the book “Nimish Sanghi, Deep Reinforcement Learning with Python With PyTorch, TensorFlow and OpenAI Gym”)
[ ] Install required packages in the sample code.
[ ] [10pts] Run the program and check whether it works correctly.
2-1. Run the program with different parameters
After you make sure your program works, do the following. Show the performance each parameter value(e.g., mean accumulated episode reward, size of error, etc).
[x] [5pts] run the program 5 times with different size of replay memory.
[x] [4pts] run the program using different gradient policy including adadelta, adagrad, adam(or adamw), or RMSProp.
[x] [6pts] change the way target network parameters are updated. From Polyak method to periodic update, or vice versa
[x] change Q network configuration (make sure you change target network as well)
[x] [6pts] add additional layers
[x] [6pts] change activation function from ReLU to others
[ ] [6pts] change error function with and without entropy term
[ ] [8pts] change CNN to neural network, or vice ver
3-1. In your homework report:
[x] For each category of Q. 2,
[x] explicitly and clearly explain the part you have changed in the program
[x] explain why and how you changed this part.
[x] show the results
[x] If the program shows video of cartpole, include the video as well.
This class failed hard 😵💫 I still got an A, but I didn't get as much out of it as I wanted to. And with my current level of motivation, I won't be making much progress here.
1-1. Sample Code Using Tensorflow
1-2. Sample Code Using PyTorch
2-1. Run the program with different parameters
3-1. In your homework report:
4-1. (extra points, max: 8pts) Policy gradient method (REINFORCE algorithm)