HW3 - DQN & REINFORCE - Githubissues

nickumia commented 1 year ago

1-1. Sample Code Using Tensorflow

If you want to use Tensorflow:
- [ ] Make sure Tensorflow is installed in your computer
- [ ] Choose one of the sample code .
  - [x] https://www.tensorflow.org/agents/tutorials/1_dqn_tutorial (From Tensorflow homepage) OR
  - [ ] ‘listing6_1_dqn_tensorflow.ipynb’ file in the assignment (From the book “Nimish Sanghi, Deep Reinforcement Learning with Python With PyTorch, TensorFlow and OpenAI Gym”)
- [x] Install required packages in the sample code.
- [x] [10pts] Run the program and check whether it works correctly.

1-2. Sample Code Using PyTorch

If you are familiar with PyTorch, you can use PyTorch program code.
- [ ] You will use the PyTorch library. To get started, follow the instructions to install PyTorch and then, go through a tutorial about the basics of PyTorch. Refer to https://pytorch.org/
- [ ] Choose one of the sample code.
  - [ ] https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html (From PyTorch homepage) OR
  - [ ] ‘listing6_1_dqn_pytorch.ipynb’ file in the assignment (From the book “Nimish Sanghi, Deep Reinforcement Learning with Python With PyTorch, TensorFlow and OpenAI Gym”)
- [ ] Install required packages in the sample code.
- [ ] [10pts] Run the program and check whether it works correctly.

2-1. Run the program with different parameters

After you make sure your program works, do the following. Show the performance each parameter value(e.g., mean accumulated episode reward, size of error, etc).
- [x] [5pts] run the program 5 times with different size of replay memory.
- [x] [4pts] run the program using different gradient policy including adadelta, adagrad, adam(or adamw), or RMSProp.
- [x] [6pts] change the way target network parameters are updated. From Polyak method to periodic update, or vice versa
- [x] change Q network configuration (make sure you change target network as well)
  - [x] [6pts] add additional layers
  - [x] [6pts] change activation function from ReLU to others
- [ ] [6pts] change error function with and without entropy term
- [ ] [8pts] change CNN to neural network, or vice ver

3-1. In your homework report:

[x] For each category of Q. 2,
- [x] explicitly and clearly explain the part you have changed in the program
- [x] explain why and how you changed this part.
- [x] show the results
[x] If the program shows video of cartpole, include the video as well.

4-1. (extra points, max: 8pts) Policy gradient method (REINFORCE algorithm)

This problem is about implementing policy gradient method REINFORCE algorithm.
- [ ] Use one the following sample code
  - [ ] listing7_1_reinforce_pytorch.ipynb OR
  - [ ] listing7_1_reinforce_tensorflow.ipynb
- [ ] [2pts] run the program. Show it works.
- [ ] [2pts/each] perform 4) and 5) of Q. 2

nickumia commented 12 months ago

Homework started here.

nickumia commented 11 months ago

More time is required for this 😞

nickumia commented 10 months ago

This class failed hard 😵‍💫 I still got an A, but I didn't get as much out of it as I wanted to. And with my current level of motivation, I won't be making much progress here.

nickumia / cap6629

HW3 - DQN & REINFORCE #11