nazaruka / gym-http-api

NSGA2-based Sonic agent + experimental code
MIT License
1 stars 1 forks source link

Initialize a Deep CNN from a flat vector of weights #25

Closed schrum2 closed 5 years ago

schrum2 commented 5 years ago

We eventually need to be able to do this either in the PPO code that uses tensorflow, or in PyTorch (but we would need a working deep RL algorithm in PyTorch first). Either way, we have to have some way of sending a flat vector/array as a parameter, and having those weights fill out all of the weights of a Deep CNN. This initialization scheme would be an alternative to the random initializers that are typically used when beginning training of a deep net.

schrum2 commented 5 years ago

The PyTorch PPO code from this repo seems promising: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail It can run in Atari games, as long as you change the line that has a context='fork' to context='spawn' in envs.py (this is a Windows vs Unix issue).

I also found that I can save models using the --save_dir parameter.

Main issues remaining before we actually copy any of this code into this codebase: 1) Need to get it to run Sonic. Currently, it runs Atari games, but haven't been able to get it to run a Retro game yet. 2) Find out how to manually specify the weights of the network

schrum2 commented 5 years ago

I went ahead and added code to the repo, though Sonic still does not work. However, I wanted to document how networks could be saved. For example python main.py --env-name "Qbert-ramNoFrameskip-v4" --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --value-loss-coef 0.5 --num-processes 8 --num-steps 128 --num-mini-batch 4 --log-inte rval 1 --use-linear-lr-decay --entropy-coef 0.01 --save-dir "save" --save-interval 5

schrum2 commented 5 years ago

Sonic doesn't work, but here is the command I keep trying to execute: python main.py --env-name "SonicTheHedgehog-Genesis" --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --value-loss-coef 0.5 --num-processes 8 --num-steps 128 --num-mini-batch 4 --log-int erval 1 --use-linear-lr-decay --entropy-coef 0.01 --save-dir "save" --save-interval 5

This results in the following error

Traceback (most recent call last):
  File "main.py", line 196, in <module>
    main()
  File "main.py", line 116, in main
    rollouts.masks[step])
  File "E:\Users\he_de\workspace\gym-http-api\pytorch-a2c-ppo-acktr-gail\a2c_ppo_acktr\model.py", line 55, in act
    value, actor_features, rnn_hxs = self.base(inputs, rnn_hxs, masks)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "E:\Users\he_de\workspace\gym-http-api\pytorch-a2c-ppo-acktr-gail\a2c_ppo_acktr\model.py", line 190, in forward
    x = self.main(inputs / 255.0)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
    input = module(input)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\linear.py", line 92, in forward
    return F.linear(input, self.weight, self.bias)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py", line 1406, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [8 x 27648], m2: [1568 x 512] at C:/w/1/s/tmp_conda_3.7_044431/conda/conda-bld/pytorch_1556686009173/work/aten/src\THC/generic/THCTensorMathBlas.cu:268

I think the main take-away is that the size of the input is not correctly scaled for the Genesis input size. We should probably look closely at other code where Sonic successfully executes and see how they manipulate the network and input observation in that code.

schrum2 commented 5 years ago

Careful reading of this page reveals why the Sonic input shape is [1,12,224,320] The (224,320) is obviously the screen size, buy why 12 channels? There are only 3 input channels (RGB), but the Sonic context gives as input 4 frames per observation, and 4*3 is 12.

schrum2 commented 5 years ago

The code in the pytorch-a2c-ppo-acktr-gail subdir works. At least, it is able to run, and save models that play Sonic. I'll need to train if for a while to see if the models are any good.

The next step is to copy the relevant code from here over to the NSGA-II portion of the code and initialize networks with a flat vector of random weights. It is ok to copy code over to the NSGA2 subdir, but try to only copy code that is needed. Maybe just model.py?

schrum2 commented 5 years ago

According to https://discuss.pytorch.org/t/access-weights-of-a-specific-module-in-nn-sequential/3627 it should be easy to access and change the weights in PyTorch. I believe our PPO agent stores an nn.Sequential object in a variable called main, as in self.main. However, when PPO is initialized as ppo.PPO this is saved in a variable called agent. So, you can access the nn.Sequential via agent.main.

Then, according to the post, you should be able to do something like agent.main.layer[0].weight to access the weights of the first layer, and so on. Hopefully this will also let us change the weights.

We probably also need to change the internal neuron biases.

In any case, this is a start. Verify if this works by looping through agent.main.layer and printing out the shape of each tensor to figure out how many weights we are dealing with.

schrum2 commented 5 years ago

Correction: The following shows you all of the parameter values:

    for p in agent.actor_critic.base.main.parameters():
        print(p)
schrum2 commented 5 years ago

This code was somewhat useful in getting an inside look at the network architecture, but we're not quite at the point of counting the total number of needed parameters, let alone setting them via a flat vector:

    for i in range(len(agent.actor_critic.base.main)):
        print(agent.actor_critic.base.main[i])
    for p in agent.actor_critic.base.main.parameters():
        print(p.size())

We need to be able to detect the ReLU layers and ignore them since they have no parameters.

schrum2 commented 5 years ago

This issue and #24 are the two highest priority issues.

nazaruka commented 5 years ago

Seems like all the Conv2D layers are four-dimensional, which means that accessing, say, the very first row of weights is done by calling param.data[0][0][0].

Writing this:

for param in agent.actor_critic.base.main.parameters():
    print(param.data[0][0][0])
    param.data[0][0][0] = torch.FloatTensor([1,2,3,4,5,6,7,8])
    print(param.data[0][0][0])

gives us the following output:

tensor([-0.0547, -0.0550, -0.1201,  0.0002,  0.0583,  0.0526,  0.0664, -0.0165])
tensor([1., 2., 3., 4., 5., 6., 7., 8.])

So we know for sure that the above code should be the foundation for how we replace the agent's weights. However, we still need to figure out how to carry over the values of a flat Tensor to one with several dimensions - PyTorch may have code for this

schrum2 commented 5 years ago

Investigate the view method: https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view

schrum2 commented 5 years ago

This issue is the number 1 priority. Theoretically, it is the last thing we need to do before running an experiment, though I'm sure we'll find plenty of more work to do.

nazaruka commented 5 years ago

The latest commit has got some checks for quitting if certain errors arise (going out of bounds from the lengths and sizes arrays, a multiplied product of sizes not corresponding to the respective "length," could probably brainstorm some more).

The issue occurs when I try to declare a tuple of dimensions into the call split_vector[i] = torch.reshape(split_vector[i], sizes[i]). I receive this error:

Traceback (most recent call last):
  File "NSGAII.py", line 359, in <module>
    (fitness_scores, novelty_scores) = evaluate_population(solutions, agent)
  File "NSGAII.py", line 242, in evaluate_population
    set_weights(agent.actor_critic, weights)
  File "NSGAII.py", line 291, in set_weights
    split_vector[i] = torch.reshape(split_vector[i], sizes[i])
TypeError: 'tuple' object does not support item assignment

Obviously, this error means I would have to break up the sizes[i] tuple and place every individual element into reshape(). How can I perform this action without going into specific length implementation, though? I'll be looking at this tonight

nazaruka commented 5 years ago

For posterity, this command:

for layer in net.state_dict():
    value = net.state_dict().get(layer)
    print(layer, value.size())

returns the following output:

base.gru.weight_ih_l0 torch.Size([1536, 512])
base.gru.weight_hh_l0 torch.Size([1536, 512])
base.gru.bias_ih_l0 torch.Size([1536])
base.gru.bias_hh_l0 torch.Size([1536])
base.main.0.weight torch.Size([32, 12, 8, 8])
base.main.0.bias torch.Size([32])
base.main.2.weight torch.Size([64, 32, 4, 4])
base.main.2.bias torch.Size([64])
base.main.4.weight torch.Size([48, 64, 3, 3])
base.main.4.bias torch.Size([48])
base.main.7.weight torch.Size([512, 41472])
base.main.7.bias torch.Size([512])
base.critic_linear.weight torch.Size([1, 512])
base.critic_linear.bias torch.Size([1])
dist.linear.weight torch.Size([12, 512])
dist.linear.bias torch.Size([12])
schrum2 commented 5 years ago

I made a little progress but haven't solved this yet. When you run the current code, it prints out the current weights for the first layer and waits for a key press. Then it prints out the reshaped weights from the flay vector that will replace it. This works now. Then it tries to reassign the weights, but when it prints them out they have not changed. Here is the output I get:

tensor([[-0.0036,  0.0385, -0.0130,  ...,  0.0092, -0.0114, -0.0004],
        [ 0.0153,  0.0406, -0.0066,  ..., -0.0119, -0.0211, -0.0102],
        [ 0.0315, -0.0085,  0.0042,  ...,  0.0106,  0.0268, -0.0183],
        ...,
        [-0.0092, -0.0170,  0.0120,  ..., -0.0403, -0.0352,  0.0042],
        [ 0.0384, -0.0289, -0.0107,  ..., -0.0036,  0.0246, -0.0060],
        [-0.0024, -0.0051,  0.0355,  ..., -0.0236,  0.0429, -0.0369]],
       device='cuda:0')
Press
tensor([[1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        ...,
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.]], dtype=torch.float64)
Press
tensor([[-0.0036,  0.0385, -0.0130,  ...,  0.0092, -0.0114, -0.0004],
        [ 0.0153,  0.0406, -0.0066,  ..., -0.0119, -0.0211, -0.0102],
        [ 0.0315, -0.0085,  0.0042,  ...,  0.0106,  0.0268, -0.0183],
        ...,
        [-0.0092, -0.0170,  0.0120,  ..., -0.0403, -0.0352,  0.0042],
        [ 0.0384, -0.0289, -0.0107,  ..., -0.0036,  0.0246, -0.0060],
        [-0.0024, -0.0051,  0.0355,  ..., -0.0236,  0.0429, -0.0369]],
       device='cuda:0')
Press

I wonder if I have to explicitly send the changes to device='cuda:0'. I also wonder if @nazaruka will have the same problem when using cpu as the device instead.

nazaruka commented 5 years ago

We know for a fact that the initialization works because I receive the following output when calling for print statements on solutions[i] and the layers' data:

solutions[0] = [-0.70375955 -0.02706597 -0.64225479 ... 0.62566792 0.73304887 0.08024593] First line of first layer: tensor([[-0.7038, -0.0271, -0.6423, ..., -0.0540, -0.0682, 0.4632], Last line of last layer: tensor([0.7952, 0.9370, 0.4612, -0.1530, 0.1014, 0.6257, 0.2201, -0.6740, 0.2653, 0.6257, 0.7330, 0.0802])

New layers are successfully initialized for every episode. I'm going to run without rendering or printing to see if it will work across genomes while I comment the code out.