kzl / decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
MIT License
2.38k stars 450 forks source link

After loading 50 trajectories, the terminal shows `killed` #8

Closed SunHaoOne closed 3 years ago

SunHaoOne commented 3 years ago

Hello, Thanks for you code. And after reading readme-atari.md, I have set the env and downloaded the dataset, then I tried to run the follows: (--block_size 90 There is no block_size args, so I removed it )

python -m atari.run_dt_atari.py --seed 123 --epochs 5 --model_type 'reward_conditioned' --num_steps 500000 --num_buffers 50 --game 'Breakout' --batch_size 64 --data_dir_prefix /home/shy/decision-transformer/atari/dqn_replay/

Then it shows:

loading from buffer 45 which has 0 already loaded
this buffer has 2196 loaded transitions and there are now 2196 transitions total divided into 1 trajectories
loading from buffer 2 which has 0 already loaded
this buffer has 1234 loaded transitions and there are now 3430 transitions total divided into 2 trajectories
loading from buffer 28 which has 0 already loaded
this buffer has 2413 loaded transitions and there are now 5843 transitions total divided into 3 trajectories
loading from buffer 34 which has 0 already loaded
this buffer has 2718 loaded transitions and there are now 8561 transitions total divided into 4 trajectories
loading from buffer 38 which has 0 already loaded
this buffer has 2326 loaded transitions and there are now 10887 transitions total divided into 5 trajectories
loading from buffer 17 which has 0 already loaded
this buffer has 2425 loaded transitions and there are now 13312 transitions total divided into 6 trajectories
loading from buffer 19 which has 0 already loaded
this buffer has 3063 loaded transitions and there are now 16375 transitions total divided into 7 trajectories
loading from buffer 42 which has 0 already loaded
this buffer has 1190 loaded transitions and there are now 17565 transitions total divided into 8 trajectories
loading from buffer 22 which has 0 already loaded
this buffer has 1002 loaded transitions and there are now 18567 transitions total divided into 9 trajectories
loading from buffer 33 which has 0 already loaded
this buffer has 1473 loaded transitions and there are now 20040 transitions total divided into 10 trajectories
loading from buffer 32 which has 0 already loaded
this buffer has 4009 loaded transitions and there are now 24049 transitions total divided into 11 trajectories
loading from buffer 49 which has 0 already loaded
this buffer has 2006 loaded transitions and there are now 26055 transitions total divided into 12 trajectories
loading from buffer 47 which has 0 already loaded
this buffer has 1935 loaded transitions and there are now 27990 transitions total divided into 13 trajectories
loading from buffer 9 which has 0 already loaded
this buffer has 1750 loaded transitions and there are now 29740 transitions total divided into 14 trajectories
loading from buffer 32 which has 4009 already loaded
this buffer has 7451 loaded transitions and there are now 33182 transitions total divided into 15 trajectories
loading from buffer 46 which has 0 already loaded
this buffer has 2137 loaded transitions and there are now 35319 transitions total divided into 16 trajectories
loading from buffer 32 which has 7451 already loaded
this buffer has 10311 loaded transitions and there are now 38179 transitions total divided into 17 trajectories
loading from buffer 47 which has 1935 already loaded
this buffer has 5165 loaded transitions and there are now 41409 transitions total divided into 18 trajectories
loading from buffer 25 which has 0 already loaded
this buffer has 2124 loaded transitions and there are now 43533 transitions total divided into 19 trajectories
loading from buffer 19 which has 3063 already loaded
this buffer has 5660 loaded transitions and there are now 46130 transitions total divided into 20 trajectories
loading from buffer 14 which has 0 already loaded
this buffer has 1462 loaded transitions and there are now 47592 transitions total divided into 21 trajectories
loading from buffer 36 which has 0 already loaded
this buffer has 1173 loaded transitions and there are now 48765 transitions total divided into 22 trajectories
loading from buffer 32 which has 10311 already loaded
this buffer has 13460 loaded transitions and there are now 51914 transitions total divided into 23 trajectories
loading from buffer 16 which has 0 already loaded
this buffer has 2148 loaded transitions and there are now 54062 transitions total divided into 24 trajectories
loading from buffer 4 which has 0 already loaded
this buffer has 1754 loaded transitions and there are now 55816 transitions total divided into 25 trajectories
loading from buffer 49 which has 2006 already loaded
this buffer has 4612 loaded transitions and there are now 58422 transitions total divided into 26 trajectories
loading from buffer 3 which has 0 already loaded
this buffer has 1200 loaded transitions and there are now 59622 transitions total divided into 27 trajectories
loading from buffer 2 which has 1234 already loaded
this buffer has 2192 loaded transitions and there are now 60580 transitions total divided into 28 trajectories
loading from buffer 20 which has 0 already loaded
this buffer has 1644 loaded transitions and there are now 62224 transitions total divided into 29 trajectories
loading from buffer 39 which has 0 already loaded
this buffer has 1473 loaded transitions and there are now 63697 transitions total divided into 30 trajectories
loading from buffer 2 which has 2192 already loaded
this buffer has 3244 loaded transitions and there are now 64749 transitions total divided into 31 trajectories
loading from buffer 20 which has 1644 already loaded
this buffer has 4785 loaded transitions and there are now 67890 transitions total divided into 32 trajectories
loading from buffer 47 which has 5165 already loaded
this buffer has 7681 loaded transitions and there are now 70406 transitions total divided into 33 trajectories
loading from buffer 48 which has 0 already loaded
this buffer has 2836 loaded transitions and there are now 73242 transitions total divided into 34 trajectories
loading from buffer 7 which has 0 already loaded
this buffer has 2135 loaded transitions and there are now 75377 transitions total divided into 35 trajectories
loading from buffer 41 which has 0 already loaded
this buffer has 933 loaded transitions and there are now 76310 transitions total divided into 36 trajectories
loading from buffer 35 which has 0 already loaded
this buffer has 1973 loaded transitions and there are now 78283 transitions total divided into 37 trajectories
loading from buffer 28 which has 2413 already loaded
this buffer has 4864 loaded transitions and there are now 80734 transitions total divided into 38 trajectories
loading from buffer 38 which has 2326 already loaded
this buffer has 5358 loaded transitions and there are now 83766 transitions total divided into 39 trajectories
loading from buffer 33 which has 1473 already loaded
this buffer has 3457 loaded transitions and there are now 85750 transitions total divided into 40 trajectories
loading from buffer 21 which has 0 already loaded
this buffer has 2198 loaded transitions and there are now 87948 transitions total divided into 41 trajectories
loading from buffer 30 which has 0 already loaded
this buffer has 2916 loaded transitions and there are now 90864 transitions total divided into 42 trajectories
loading from buffer 27 which has 0 already loaded
this buffer has 2128 loaded transitions and there are now 92992 transitions total divided into 43 trajectories
loading from buffer 34 which has 2718 already loaded
this buffer has 4650 loaded transitions and there are now 94924 transitions total divided into 44 trajectories
loading from buffer 33 which has 3457 already loaded
this buffer has 6102 loaded transitions and there are now 97569 transitions total divided into 45 trajectories
loading from buffer 12 which has 0 already loaded
this buffer has 3207 loaded transitions and there are now 100776 transitions total divided into 46 trajectories
loading from buffer 40 which has 0 already loaded
this buffer has 1369 loaded transitions and there are now 102145 transitions total divided into 47 trajectories
loading from buffer 3 which has 1200 already loaded
this buffer has 3316 loaded transitions and there are now 104261 transitions total divided into 48 trajectories
loading from buffer 42 which has 1190 already loaded
this buffer has 2969 loaded transitions and there are now 106040 transitions total divided into 49 trajectories
loading from buffer 5 which has 0 already loaded
this buffer has 2499 loaded transitions and there are now 108539 transitions total divided into 50 trajectories
loading from buffer 0 which has 0 already loaded
killed
(decision-transformer-atari) shy@user:~/decision-transformer$ 

Then it shows killed and I guess when loading the dataset,this problem is due to excessive memory usage and how to fix it?Thanks a lot.

lili-chen commented 3 years ago

Yes I have had this issue before, but unfortunately I am not sure how to fix it. It only happened for me on some machines when other jobs were running. It is possible that reducing to replay_capacity=100000 here https://github.com/kzl/decision-transformer/blob/c9e6ac0b75895cef3e7c06cd309fd398ec9ceef5/atari/create_dataset.py#L45 could help resolve the issue, but I have not tried this yet. Let me know if that works!

SunHaoOne commented 3 years ago

Yes I have had this issue before, but unfortunately I am not sure how to fix it. It only happened for me on some machines when other jobs were running. It is possible that reducing to replay_capacity=100000 here

https://github.com/kzl/decision-transformer/blob/c9e6ac0b75895cef3e7c06cd309fd398ec9ceef5/atari/create_dataset.py#L45

could help resolve the issue, but I have not tried this yet. Let me know if that works!

Thanks for your reply. It is just the CPU memory problem. And I have changed the replay_capacity=100,000 and then changed the num_steps=100,000also changed the follows:

line 65:  if i >= 100,000:

I'm not sure if I made the right changes, but the program works fine and shows as follows:

this buffer has 3207 loaded transitions and there are now 100776 transitions total divided into 46 trajectories
max rtg is 84
max timestep is 2632
epoch 1 iter 1573: train loss 0.67590. lr 5.600293e-04: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1574/1574 [03:40<00:00,  7.14it/s]
target return: 90, eval return: 13
epoch 2 iter 1573: train loss 0.57056. lr 4.503075e-04: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1574/1574 [03:51<00:00,  6.80it/s]
target return: 90, eval return: 14
epoch 3 iter 1573: train loss 0.38198. lr 3.002664e-04: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1574/1574 [03:46<00:00,  6.96it/s]
target return: 90, eval return: 37
epoch 4 iter 1573: train loss 0.36018. lr 1.501538e-04: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1574/1574 [03:47<00:00,  6.93it/s]
target return: 90, eval return: 26
epoch 5 iter 1573: train loss 0.22186. lr 6.000000e-05: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1574/1574 [03:51<00:00,  6.79it/s]
target return: 90, eval return: 22

And it seems that the data is not enough, the eval return is less than the target, so do you have any suggestions? Thanks a lot.

lili-chen commented 3 years ago

Could you try keeping the original num_steps=500000 but using the reduced replay_capacity=100000? (I think the replay_capacity there can be smaller than num_steps since it is referring to the number of samples loaded from each of the 50 checkpoints).

SunHaoOne commented 3 years ago

Could you try keeping the original num_steps=500000 but using the reduced replay_capacity=100000? (I think the replay_capacity there can be smaller than num_steps since it is referring to the number of samples loaded from each of the 50 checkpoints).

Thanks for your quick reply. I have tried the original num_steps=500000 and the reduced replay_capacity=100000again, it still showskilled. And after checking $ top, I found the KiB Swap shows 0 free. And later I will try the reduced num_steps=[from 100,000 to 500,000](And finally, only changing the num_steps=100,000can make the program work )

lili-chen commented 3 years ago

Hm, I'm not sure then. (As a last resort, does setting --trajectories_per_buffer 10 work?) I have been thinking about saving the 1% replay dataset which should fix this problem but that will take some time. Sorry I don't have a better fix at the moment!

SunHaoOne commented 3 years ago

Hm, I'm not sure then. (As a last resort, does setting --trajectories_per_buffer 10 work?) I have been thinking about saving the 1% replay dataset which should fix this problem but that will take some time. Sorry I don't have a better fix at the moment!

Great! It works well! And now it is about 380trajectories with max rtg = 98,max time step=2654. Then after training, I will paste the result here. Thanks again!

loading from buffer 28 which has 15858 already loaded
this buffer has 31057 loaded transitions and there are now 510358 transitions total divided into 380 trajectories
max rtg is 98
max timestep is 2654
epoch 1 iter 7972: train loss 0.81460. lr 5.598514e-04: 100%|█| 7973/7973 [19:25
target return: 90, eval return: 82
epoch 2 iter 7972: train loss 0.66133. lr 4.500607e-04: 100%|████████| 7973/7973 [19:05<00:00,  6.96it/s]
target return: 90, eval return: 55
epoch 3 iter 7972: train loss 0.50567. lr 3.000525e-04: 100%|████████| 7973/7973 [19:47<00:00,  6.71it/s]
target return: 90, eval return: 63
epoch 4 iter 7972: train loss 0.44295. lr 1.500303e-04: 100%|████████| 7973/7973 [20:56<00:00,  6.34it/s]
target return: 90, eval return: 55
epoch 5 iter 7972: train loss 0.36810. lr 6.000000e-05: 100%|████████| 7973/7973 [20:43<00:00,  6.41it/s]
target return: 90, eval return: 42

It seems that the eval return is higher than before, and maybe this helps to train the agent.

lili-chen commented 3 years ago

Yes that looks right! I will change the default trajectories_per_buffer in the scripts in case others have similar issues.

pushkalkatara commented 3 years ago

Hi I was facing the same issue and thus tried with the following config replay_capacity=100000 and trajectories_per_buffer=10, Max eval return I got is 65.

this buffer has 31057 loaded transitions and there are now 510358 transitions total divided into 380 trajectories
max rtg is 98
max timestep is 2654
  0%|                                                                                                                                                   | 0/3987 [00:00<?, ?it/s]/home/pushkalkatara/mrd/conda/envs/decision-transfor-transformer-atari/lib/python3.7/site-packages/torch/nn/parallel/_functions.py:61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
  warnings.warn('Was asked to gather along dimension 0, but all '
epoch 1 iter 3986: train loss 0.82629. lr 5.598514e-04: 100%|████████████████████████████████████████████████████████████████████████████████| 3987/3987 [31:37<00:00,  2.10it/s]
target return: 90, eval return: 25
epoch 2 iter 3986: train loss 0.69285. lr 4.500607e-04: 100%|████████████████████████████████████████████████████████████████████████████████| 3987/3987 [28:00<00:00,  2.37it/s]
target return: 90, eval return: 65
epoch 3 iter 3986: train loss 0.54726. lr 3.000525e-04: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3987/3987 [29:35<00:00,  2.25it/s]
target return: 90, eval return: 32
epoch 4 iter 3986: train loss 0.49444. lr 1.500303e-04: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3987/3987 [31:52<00:00,  2.09it/s]
target return: 90, eval return: 49
epoch 5 iter 3986: train loss 0.44959. lr 6.000000e-05: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3987/3987 [27:16<00:00,  2.44it/s]
target return: 90, eval return: 65