Closed AjayTalati closed 7 years ago
Dear Ajay, Thank you for your concerns on my repo!
In line https://github.com/kimhc6028/pathnet-pytorch/pathnet.py#18 in pathnet.py, you are right.
For line https://github.com/kimhc6028/pathnet-pytorch/main.py#65 in main.py, because the original paper set M to 20, I followed this setting in code. Of course it would be much easier to understand if I set this parameters by script.
For line https://github.com/kimhc6028/pathnet-pytorch/pathnet.py#77-79 in pathnet.py, I appreciate that you pointed out severe bug. Yes it means N=3, but in cifar-svhn task N is 5. I have to fix it :)
I will fix all the things you mentioned. Thank you for your concern. And I have to tell that I cannot reproduce the result now, thus I am fixing whole code now.
And I am really interested in extending my code to A3C task, of which I gave up because it was too burden-some to work alone. It would be nice if I can help your work!
Dear Kim,
thank you very much for your help :+1: I really like your code :+1:
I have A3C working for Pong
now, but have not tried to use the trained net for transfer to another game yet.
The Atari games take a long time to train, (more than 4hrs for Pong
), because adding an RNN layer like
exec("self.m4" + str(i) + " = nn.GRUCell(32 * 3 * 3 + 2, " + str(n_hidden) +")")
exec("self.rnn.append(self.m4" + str(i) + ")")
is very costly. So I will try to apply the PathNet-A3C algorithm to do transfer learning among the "algorithmic tasks" in OpenAI gym - that should be much quicker!
It should be fun extending PathNet to RNNs :)
I will share the code with you as soon as I get good results. I would enjoy very much working with you :)
All the best,
Ajay
Dear Ajay, Did you already implemented A3C on this code? exec(....) only occurs when network is initialized (which occurs only twice in every experiment), so I considered that using exec does not severely slows learning.
If this is the problem of using RNN, I would like to suggest that stacking previous images will be sufficient, because stacking images also satisfy Markov property. Actually I am using https://github.com/ikostrikov/pytorch-a3c to another project, and this code costs less than 30 mins to learn.
Anyway, it is true that Atari costs so much time to learn. Implementation to the Algorithmic task is really good idea, because everyone can enjoy code with less computational resource!
I am really surprised and pleased to work with you. Hope I can see your great result soon!
Dear Kim,
I really like the code changes - its great improvement - thank you :+1:
I'm still doing experiments for the Algorithmic tasks, A3C only seems to learn quickly on the Copy
and DuplicatedInput
tasks.
The other 4 tasks are much more difficult and have very slow learning (on RepeatCopy
it takes many hours to get above 20) as is shown in Figure 1 of this paper,
Bridging the Gap Between Value and Policy Based Reinforcement Learning
I think the simplest experiment to do is to first learn the Copy
task, and then to use the trained PathNet to see if it improves the learning speed for the RepeatCopy
task.
Hopefully we will get interesting results :+1:
Best wishes,
Ajay
Dear Ajay,
If Copy
and DuplicatedInput
are only tasks that can be learnt quickly, what about learning Copy
as the first task and transfer to DuplicatedInput
? This would lessen our work. Also, I think Copy
and DuplicatedInput
shares much in common, thus we can expect good transfer performance between two tasks.
By the way, would you integrate your A3C code to this repository, or do you want to build your own repo? If you want to integrate, do you mind if I add you for a collaborator for this repo?
Sincerely,
Kim
Dear Kim,
my code is very messy at the moment, so maybe I should clean it before integrating to the repo?
The code is attached if you want to play with it? I hope it runs for you without any problems?
I agree, I think your idea is really good :) We should start with the
simplest task copy
and then examine the benefits of using the same
PathNet to learn duplicatedInput
. Both tasks run very quickly, with the
standard A3C algorithm - it uses an embedding to replace the conv-nets.
Yes of course please add me as a collaborator, I would be very proud to be shown as working with you.
Best wishes,
Ajay
On 3 May 2017 at 17:14, Kim Heecheol notifications@github.com wrote:
Dear Ajay, If Copy and DuplicatedInput are only tasks that can be learnt quickly, what about learning Copy as the first task and transfer to DuplicatedInput? This would lessen our work. Also, I think Copy and DuplicatedInput shares much in common, thus we can expect good transfer performance between two tasks. By the way, would you integrate your A3C code to this repository, or do you want to build your own repo? If you want to integrate, do you mind if I add you for a collaborator for this repo? Sincerely Kim
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kimhc6028/pathnet-pytorch/issues/1#issuecomment-298959931, or mute the thread https://github.com/notifications/unsubscribe-auth/AJzbZOToRT7bPnj0VVoIpujtoqyks8H3ks5r2KftgaJpZM4NMsJQ .
Sorry I forgot, to run please use,
OMP_NUM_THREADS=1 python A3C_openai_algorithmic_v3.py
On 3 May 2017 at 17:27, Ajay Talati ajaytalati@googlemail.com wrote:
Dear Kim,
my code is very messy at the moment, so maybe I should clean it before integrating to the repo?
The code is attached if you want to play with it? I hope it runs for you without any problems?
I agree, I think your idea is really good :) We should start with the simplest task
copy
and then examine the benefits of using the same PathNet to learnduplicatedInput
. Both tasks run very quickly, with the standard A3C algorithm - it uses an embedding to replace the conv-nets.Yes of course please add me as a collaborator, I would be very proud to be shown as working with you.
Best wishes,
Ajay
On 3 May 2017 at 17:14, Kim Heecheol notifications@github.com wrote:
Dear Ajay, If Copy and DuplicatedInput are only tasks that can be learnt quickly, what about learning Copy as the first task and transfer to DuplicatedInput? This would lessen our work. Also, I think Copy and DuplicatedInput shares much in common, thus we can expect good transfer performance between two tasks. By the way, would you integrate your A3C code to this repository, or do you want to build your own repo? If you want to integrate, do you mind if I add you for a collaborator for this repo? Sincerely Kim
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kimhc6028/pathnet-pytorch/issues/1#issuecomment-298959931, or mute the thread https://github.com/notifications/unsubscribe-auth/AJzbZOToRT7bPnj0VVoIpujtoqyks8H3ks5r2KftgaJpZM4NMsJQ .
Also you I forgot you'll need the shared Adam optimizer,
On 3 May 2017 at 17:29, Ajay Talati ajaytalati@googlemail.com wrote:
Sorry I forgot, to run please use,
OMP_NUM_THREADS=1 python A3C_openai_algorithmic_v3.py
On 3 May 2017 at 17:27, Ajay Talati ajaytalati@googlemail.com wrote:
Dear Kim,
my code is very messy at the moment, so maybe I should clean it before integrating to the repo?
The code is attached if you want to play with it? I hope it runs for you without any problems?
I agree, I think your idea is really good :) We should start with the simplest task
copy
and then examine the benefits of using the same PathNet to learnduplicatedInput
. Both tasks run very quickly, with the standard A3C algorithm - it uses an embedding to replace the conv-nets.Yes of course please add me as a collaborator, I would be very proud to be shown as working with you.
Best wishes,
Ajay
On 3 May 2017 at 17:14, Kim Heecheol notifications@github.com wrote:
Dear Ajay, If Copy and DuplicatedInput are only tasks that can be learnt quickly, what about learning Copy as the first task and transfer to DuplicatedInput? This would lessen our work. Also, I think Copy and DuplicatedInput shares much in common, thus we can expect good transfer performance between two tasks. By the way, would you integrate your A3C code to this repository, or do you want to build your own repo? If you want to integrate, do you mind if I add you for a collaborator for this repo? Sincerely Kim
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kimhc6028/pathnet-pytorch/issues/1#issuecomment-298959931, or mute the thread https://github.com/notifications/unsubscribe-auth/AJzbZOToRT7bPnj0VVoIpujtoqyks8H3ks5r2KftgaJpZM4NMsJQ .
Dear Ajay,
I invited you as a collaborator. Please check it :)
And I made a branch named a3c
. Let's modify this branch to our reinforcement learning task.
Sorry, I couldn't find any code. If possible, just push it to a3c
branch, so that we can modify it more easily :)
Hi Kim the code is attached to the email I sent to your gmail account - I have problems uploading to github from my computer - but I will try to fix it very soon, sorry :(
Dear Ajay,
I confirmed that with your code Copy
takes < 1mins to learn and DuplicatedCopy
takes < 10mins. Let's work on it :)
Hi Kim,
Great :+1: I'm really happy that it's working for you :)
Hi Kim,
I just saw some really interesting work that I thought you might be interested in,
Curiosity-driven Exploration by Self-supervised Prediction
The code has not been released yet, but check out the project page
All the best,
Ajay
Dear Ajay, Thank you for sharing the paper. Looks like this paper is significantly important, as it argues "reinforcement learning without reward". But all we have to do is to star the repository and wait for the code. And then let's see if we can implement it in pytorch? :) Sincerely, Kim
Dear Kim,
thank you very much for sharing your implementation - I like it a lot :+1:
I'm trying to adapt the code to a parallel implementation to reproduce the Atari A3C experiments from the PathNets paper. I'm not sure I understand the hyper-parametrization in
pathnet.py
andmain.py
though? Can help me understand the following lines please?In line 18 in pathnet.py,
Is the
3
the same asL
the number of layers in the paper, and alsoN
the maximum number of distinct modules per layer, so that line is equivalent toIn line 65 in main.py
the list has length
3
so I guess this is forL=3
layers, each withM=20
modules, so that line is equivalent toIn lines 77-79 in pathnet.py, e.g.
the
3
terms in the sum correspond to,N=3
distinct modules per layer?Is this correct?
Thanks a lot for your help :+1:
Are you interested in extending your code to parallel implementations, and different architectures?
All the best,
Ajay