Help understanding the code?

AjayTalati commented 7 years ago

Dear Kim,

thank you very much for sharing your implementation - I like it a lot :+1:

I'm trying to adapt the code to a parallel implementation to reproduce the Atari A3C experiments from the PathNets paper. I'm not sure I understand the hyper-parametrization in pathnet.py and main.py though? Can help me understand the following lines please?

In line 18 in pathnet.py,

best_path = [[None] * 3] * 3

Is the 3 the same as L the number of layers in the paper, and also N the maximum number of distinct modules per layer, so that line is equivalent to

best_path = [[None] * N] * L

In line 65 in main.py

args.module_num = [20,20,20]

the list has length 3 so I guess this is for L=3 layers, each with M=20 modules, so that line is equivalent to

args.module_num = [M,M,M]

In lines 77-79 in pathnet.py, e.g.

x = F.relu(self.fc1[path[0][0]](x)) + F.relu(self.fc1[path[0][1]](x)) + F.relu(self.fc1[path[0][2]](x))

the 3 terms in the sum correspond to, N=3 distinct modules per layer?

Is this correct?

Thanks a lot for your help :+1:

Are you interested in extending your code to parallel implementations, and different architectures?

All the best,

Ajay

kimhc6028 commented 7 years ago

Dear Ajay, Thank you for your concerns on my repo!

In line https://github.com/kimhc6028/pathnet-pytorch/pathnet.py#18 in pathnet.py, you are right.

For line https://github.com/kimhc6028/pathnet-pytorch/main.py#65 in main.py, because the original paper set M to 20, I followed this setting in code. Of course it would be much easier to understand if I set this parameters by script.

For line https://github.com/kimhc6028/pathnet-pytorch/pathnet.py#77-79 in pathnet.py, I appreciate that you pointed out severe bug. Yes it means N=3, but in cifar-svhn task N is 5. I have to fix it :)

I will fix all the things you mentioned. Thank you for your concern. And I have to tell that I cannot reproduce the result now, thus I am fixing whole code now.

And I am really interested in extending my code to A3C task, of which I gave up because it was too burden-some to work alone. It would be nice if I can help your work!

AjayTalati commented 7 years ago

Dear Kim,

thank you very much for your help :+1: I really like your code :+1:

I have A3C working for Pong now, but have not tried to use the trained net for transfer to another game yet.

The Atari games take a long time to train, (more than 4hrs for Pong), because adding an RNN layer like

exec("self.m4" + str(i) + " = nn.GRUCell(32 * 3 * 3 + 2, " + str(n_hidden) +")")
exec("self.rnn.append(self.m4" + str(i) + ")")

is very costly. So I will try to apply the PathNet-A3C algorithm to do transfer learning among the "algorithmic tasks" in OpenAI gym - that should be much quicker!

It should be fun extending PathNet to RNNs :)

I will share the code with you as soon as I get good results. I would enjoy very much working with you :)

All the best,

Ajay

kimhc6028 commented 7 years ago

Dear Ajay, Did you already implemented A3C on this code? exec(....) only occurs when network is initialized (which occurs only twice in every experiment), so I considered that using exec does not severely slows learning.

If this is the problem of using RNN, I would like to suggest that stacking previous images will be sufficient, because stacking images also satisfy Markov property. Actually I am using https://github.com/ikostrikov/pytorch-a3c to another project, and this code costs less than 30 mins to learn.

Anyway, it is true that Atari costs so much time to learn. Implementation to the Algorithmic task is really good idea, because everyone can enjoy code with less computational resource!

I am really surprised and pleased to work with you. Hope I can see your great result soon!

AjayTalati commented 7 years ago

Dear Kim,

I really like the code changes - its great improvement - thank you :+1:

I'm still doing experiments for the Algorithmic tasks, A3C only seems to learn quickly on the Copy and DuplicatedInput tasks.

The other 4 tasks are much more difficult and have very slow learning (on RepeatCopy it takes many hours to get above 20) as is shown in Figure 1 of this paper, Bridging the Gap Between Value and Policy Based Reinforcement Learning

I think the simplest experiment to do is to first learn the Copy task, and then to use the trained PathNet to see if it improves the learning speed for the RepeatCopy task.

Hopefully we will get interesting results :+1:

Best wishes,

Ajay

kimhc6028 commented 7 years ago

Dear Ajay, If Copy and DuplicatedInput are only tasks that can be learnt quickly, what about learning Copy as the first task and transfer to DuplicatedInput? This would lessen our work. Also, I think Copy and DuplicatedInput shares much in common, thus we can expect good transfer performance between two tasks. By the way, would you integrate your A3C code to this repository, or do you want to build your own repo? If you want to integrate, do you mind if I add you for a collaborator for this repo? Sincerely, Kim

AjayTalati commented 7 years ago

Dear Kim,

my code is very messy at the moment, so maybe I should clean it before integrating to the repo?

The code is attached if you want to play with it? I hope it runs for you without any problems?

I agree, I think your idea is really good :) We should start with the simplest task copy and then examine the benefits of using the same PathNet to learn duplicatedInput. Both tasks run very quickly, with the standard A3C algorithm - it uses an embedding to replace the conv-nets.

Yes of course please add me as a collaborator, I would be very proud to be shown as working with you.

Best wishes,

Ajay

On 3 May 2017 at 17:14, Kim Heecheol notifications@github.com wrote:

Dear Ajay, If Copy and DuplicatedInput are only tasks that can be learnt quickly, what about learning Copy as the first task and transfer to DuplicatedInput? This would lessen our work. Also, I think Copy and DuplicatedInput shares much in common, thus we can expect good transfer performance between two tasks. By the way, would you integrate your A3C code to this repository, or do you want to build your own repo? If you want to integrate, do you mind if I add you for a collaborator for this repo? Sincerely Kim

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kimhc6028/pathnet-pytorch/issues/1#issuecomment-298959931, or mute the thread https://github.com/notifications/unsubscribe-auth/AJzbZOToRT7bPnj0VVoIpujtoqyks8H3ks5r2KftgaJpZM4NMsJQ .

AjayTalati commented 7 years ago

Sorry I forgot, to run please use,

OMP_NUM_THREADS=1 python A3C_openai_algorithmic_v3.py

On 3 May 2017 at 17:27, Ajay Talati ajaytalati@googlemail.com wrote:

Dear Kim,

my code is very messy at the moment, so maybe I should clean it before integrating to the repo?

The code is attached if you want to play with it? I hope it runs for you without any problems?

I agree, I think your idea is really good :) We should start with the simplest task copy and then examine the benefits of using the same PathNet to learn duplicatedInput. Both tasks run very quickly, with the standard A3C algorithm - it uses an embedding to replace the conv-nets.

Yes of course please add me as a collaborator, I would be very proud to be shown as working with you.

Best wishes,

Ajay

On 3 May 2017 at 17:14, Kim Heecheol notifications@github.com wrote:

Dear Ajay, If Copy and DuplicatedInput are only tasks that can be learnt quickly, what about learning Copy as the first task and transfer to DuplicatedInput? This would lessen our work. Also, I think Copy and DuplicatedInput shares much in common, thus we can expect good transfer performance between two tasks. By the way, would you integrate your A3C code to this repository, or do you want to build your own repo? If you want to integrate, do you mind if I add you for a collaborator for this repo? Sincerely Kim

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kimhc6028/pathnet-pytorch/issues/1#issuecomment-298959931, or mute the thread https://github.com/notifications/unsubscribe-auth/AJzbZOToRT7bPnj0VVoIpujtoqyks8H3ks5r2KftgaJpZM4NMsJQ .

AjayTalati commented 7 years ago

Also you I forgot you'll need the shared Adam optimizer,

On 3 May 2017 at 17:29, Ajay Talati ajaytalati@googlemail.com wrote:

Sorry I forgot, to run please use,

OMP_NUM_THREADS=1 python A3C_openai_algorithmic_v3.py

On 3 May 2017 at 17:27, Ajay Talati ajaytalati@googlemail.com wrote:

Dear Kim,

my code is very messy at the moment, so maybe I should clean it before integrating to the repo?

The code is attached if you want to play with it? I hope it runs for you without any problems?

I agree, I think your idea is really good :) We should start with the simplest task copy and then examine the benefits of using the same PathNet to learn duplicatedInput. Both tasks run very quickly, with the standard A3C algorithm - it uses an embedding to replace the conv-nets.

Yes of course please add me as a collaborator, I would be very proud to be shown as working with you.

Best wishes,

Ajay

On 3 May 2017 at 17:14, Kim Heecheol notifications@github.com wrote:

Dear Ajay, If Copy and DuplicatedInput are only tasks that can be learnt quickly, what about learning Copy as the first task and transfer to DuplicatedInput? This would lessen our work. Also, I think Copy and DuplicatedInput shares much in common, thus we can expect good transfer performance between two tasks. By the way, would you integrate your A3C code to this repository, or do you want to build your own repo? If you want to integrate, do you mind if I add you for a collaborator for this repo? Sincerely Kim

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kimhc6028/pathnet-pytorch/issues/1#issuecomment-298959931, or mute the thread https://github.com/notifications/unsubscribe-auth/AJzbZOToRT7bPnj0VVoIpujtoqyks8H3ks5r2KftgaJpZM4NMsJQ .

kimhc6028 commented 7 years ago

Dear Ajay,

I invited you as a collaborator. Please check it :) And I made a branch named a3c. Let's modify this branch to our reinforcement learning task.

Sorry, I couldn't find any code. If possible, just push it to a3c branch, so that we can modify it more easily :)

AjayTalati commented 7 years ago

Hi Kim the code is attached to the email I sent to your gmail account - I have problems uploading to github from my computer - but I will try to fix it very soon, sorry :(

kimhc6028 commented 7 years ago

Dear Ajay, I confirmed that with your code Copy takes < 1mins to learn and DuplicatedCopy takes < 10mins. Let's work on it :)

AjayTalati commented 7 years ago

Hi Kim,

Great :+1: I'm really happy that it's working for you :)

AjayTalati commented 7 years ago

Hi Kim,

I just saw some really interesting work that I thought you might be interested in,

Curiosity-driven Exploration by Self-supervised Prediction

The code has not been released yet, but check out the project page

All the best,

Ajay

kimhc6028 commented 7 years ago

Dear Ajay, Thank you for sharing the paper. Looks like this paper is significantly important, as it argues "reinforcement learning without reward". But all we have to do is to star the repository and wait for the code. And then let's see if we can implement it in pytorch? :) Sincerely, Kim

kimhc6028 / pathnet-pytorch

Help understanding the code? #1