jerrodparker20 / adaptive-transformers-in-rl

Adaptive Attention Span for Reinforcement Learning
130 stars 14 forks source link

Stable Transformer on Pong #16

Open furmans opened 3 years ago

furmans commented 3 years ago

Hello,

I am currently unable to recreate the results of the stable transformer on the Pong environment. I believe from the paper the last 100 episode returns should be ~17.62 for this model and environment.

I am running the train program with arguments as specified in README for Best Performing Stable Transformer on Pong.

In train.py line 731 I changed ctx = mp.get_context("fork") to ctx = mp.get_context("spawn")

The final results I obtained one one run:

[INFO:17181 train:962 2020-12-01 19:35:33,350] Steps 10001513 @ 668.5 SPS. Loss -15.672254. Return per episode: -12.7. Stats:
{'baseline_loss': 11.395485877990723,
 'entropy_loss': -18.699639002482098,
 'episode_returns': [-20.0, -18.0, -19.0],
 'last_100_episode_returns': -19.530000686645508,
 'learning_rate': 8.657589688233862e-05,
 'len_max_traj': 239,
 'max_return_achieved': '-14.0 at step 5366379',
 'mean_episode_return': -12.666666666666666,
 'num_unpadded_steps': 3346,
 'pg_loss': -8.368099212646484,
 'total_loss': -15.672253926595053}
[INFO:17181 train:969 2020-12-01 19:35:33,350] Learning finished after 10001513 steps.

Results from another run:

[INFO:15271 train:962 2020-12-04 19:47:48,776] Steps 10001156 @ 661.4 SPS. Loss -9.595014. Return per episode: -19.7. Stats:
{'baseline_loss': 14.119840621948242,
 'entropy_loss': -18.633128484090168,
 'episode_returns': [-21.0, -19.0, -20.0, -19.0],
 'last_100_episode_returns': -19.540000915527344,
 'learning_rate': 9.02709105067138e-05,
 'len_max_traj': 239,
 'max_return_achieved': '-14.0 at step 7824133',
 'mean_episode_return': -19.666666666666668,
 'num_unpadded_steps': 3309,
 'pg_loss': -5.081725597381592,
 'total_loss': -9.595013936360678}
[INFO:15271 train:969 2020-12-04 19:47:48,776] Learning finished after 10001156 steps.

I am on Ubuntu 18.04.4, using Cuda 10.2, cudnn 7, torch 1.6.0.

Thanks in advance for any help.

Best, Sean

BKHMSI commented 3 years ago

Hi @furmans,

I am having the same problem, were you able to make it work?

skkuai commented 2 years ago

I made it work. Please see the #17