Questions about some example code

I have some questions regarding on atari_pong.py As i have executed the example code(probably with only one actor)- after executing like 15 hours the reward is still negative. My motivation was, as the Agent57 paper says- the max score for Pong using Agent57 is near 20. So i suppose the reward should be near 20. is it because they used 512 actors and here we are using 1-4 actors? is this the reason of slowing down? how can i scale up ? which portion of the code is needed to be changed to scale up the process and get the optimal rewards?

Maybe i am understanding these in a wrong way- i am very new in RL. I found your qiita posts very very informative and helpful to learn RL. I understand its a lot to ask but if you have time then please answer the questions.

Thanks.

Thank you for your good question.

That's right, my PC is not powerful enough to run 512Actor. (2 Actors were full)

So I thought of learning Pong easily. AtariPongProcessor is defined. (class AtariPong in agent/processor.py)

class AtariPong(AtariProcessor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.nb_actions = 3

    def process_action(self, action):
        keys = [0, 2, 3]
        return keys[action]

    def process_step(self, observation, reward, done, info):
        observation, reward, done, info = super().process_step(observation, reward, done, info)
        if reward != 0:
            done = True
        return observation, reward, done, info

There are two contents. ・Reduced the number of actions from 5 to 3 (I try not to use unused keys) ・One point ahead. (Actually 20 points in ahead)

(However, I could not confirm Pong's learning even with this... It is unknown whether the performance is poor or the parameter is bad).

If you want to run Pong as original, change the following in examples/atari_pong.py.

 (snip)
from agent.processor import AtariPong
from agent.processor import AtariProcessor  #--- add---
 (snip)

def create_parameter(env):

    #processor = AtariPong()      #--- change ---
    processor = AtariProcessor()  #--- add---

    (snip)

    kwargs = {
        (snip)

        #"nb_actions": processor.nb_actions,   #--- change ---
        "nb_actions": env.action_space.n,      #--- change ---

       (snip)
    }
    return kwargs

pocokhc / agent57

Questions about some example code #2