Hello! according to Mnih, the function phi applies a preprocessing to the last 4 frames of a history and stacks them to produce the input to the Q function, however, reading your code I understand that it only feeds one raw frame to the Q network. Am I right?
I also found in dqn.py, the procedure iterate(self), it has a for which says:
episode = random.randint(max(0, N-50), N-1)
shouldn't this be N-self.memory instead of N-50?
This is my first interaction here, hope you understand :smile:
Hello! according to Mnih, the function phi applies a preprocessing to the last 4 frames of a history and stacks them to produce the input to the Q function, however, reading your code I understand that it only feeds one raw frame to the Q network. Am I right?
I also found in dqn.py, the procedure iterate(self), it has a for which says: episode = random.randint(max(0, N-50), N-1)
shouldn't this be N-self.memory instead of N-50?
This is my first interaction here, hope you understand :smile: