Issues in _process_base and _process_pc

This is really a nice work! I have some questions about the implementation details. Each "frame" stores prev_state, action, reward, terminal, last_action, last_reward, and thus the observation from "new_state" is not stored and is only available through "self.environment" object. In trainer.py line 212, since new_state is reached, shouldn't the action_reward be from environment.last_action and environment.last_reward instead of frame? Also in trainer.py line 255, the observation from new state is not stored in frame. Specifically, pc_experience_frames[0].terminal (from the 21st frame) indicates whether the 22nd state is a terminal state. But the inference R = 0 or R = max_a Q(s, a) is computed for the 21st state.

miyosuda / unreal

Issues in _process_base and _process_pc #14