yandexdataschool / Practical_RL

A course in reinforcement learning in the wild
The Unlicense
5.92k stars 1.7k forks source link

Week04 suggestion #493

Closed alexeyhorkin closed 2 years ago

alexeyhorkin commented 2 years ago

Hey! I just thought this change might be helpful! Just for a better understanding of what time I can stop my long learning and be on the same scale as the graph and final_score variable.

Because right now, for some reason, we're doing the following:

final_score = evaluate(
  make_env(clip_rewards=False, seed=9),
    agent, n_games=30, greedy=True, t_max=10 * 1000
) * 5
assert final_score >= 15, "not as cool as DQN can"

Suggestion:

final_score = evaluate(
  make_env(clip_rewards=False, seed=9),
    agent, n_games=30, greedy=True, t_max=10 * 1000
)
assert final_score >= 3, "not as cool as DQN can"

What do you think?

Also, I faced with this problem, and added this solution, I tested it on collab and my laptop, so I can say It worked and helped me.

review-notebook-app[bot] commented 2 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

alexeyhorkin commented 2 years ago

@dniku ^^

dniku commented 2 years ago

Thanks!