inoryy / reaver

Reaver: Modular Deep Reinforcement Learning Framework. Focused on StarCraft II. Supports Gym, Atari, and MuJoCo.
MIT License
554 stars 89 forks source link

Is this a bug in runner.py? #14

Closed Ericonaldo closed 5 years ago

Ericonaldo commented 6 years ago

Thank you for the great codes. When I tried new maps, I found some problems in runner.py. When there are more than one env, one env have done before others, then it is going to restart the game. At the end, all envs are done, the calculated rewards contain many episodes, which is a much bigger number. If you understand what I am talking about, please tell me is there any problem?

Ericonaldo commented 6 years ago

These leaves another problem. Followed by the codes, the model should be trained every n_steps, say 12, inside each training, may contains two episode, then some value calculated by the bellman equation will be wrong.

inoryy commented 6 years ago

No bugs, episode end is accounted for inside the agent during returns calculation

Ericonaldo commented 6 years ago

Oh, thanks a lot, but I still think that there is error in reward calculation...

inoryy commented 6 years ago

If you're looking at console logs then they're calculated here. Notice that rewards are averaged and only displayed after all envs report back done flag.

Ericonaldo commented 6 years ago

Yeah, I have seen those codes.:) I mean, when calculating the average score, it may include more than one episode in one env because it has to wait for others to report done. So the result is unlikely to represent the average reward of one episode, which I though you'd like to record.

inoryy commented 6 years ago

Okay, I can see it now. Can confirm it's a bug, good catch! This most likely doesn't affect non-adversarial minigames, but definitely might explain high variance for others like DefeatZerglingsAndBanelings.

As I mentioned in #7 I'm currently re-writing the project essentially from scratch, so I don' think I'll have time to fix it in legacy codebase, but I'll be sure to keep this bug in mind. I plan to publish the rewrite by the end of August.

Ericonaldo commented 6 years ago

Great! Thanks a lot. Hope for your next great release!

inoryy commented 6 years ago

Let's keep the ticket open until next release so others are informed as well.

inoryy commented 5 years ago

Fixed!