Open Meta-YZ opened 3 years ago
I have the same error here when running command "python examples/rllib.py scenarios/loop".
Hi @Yuanzhuo-Liu @Leoluo0320, I ran the command python3.7 examples/rllib.py scenarios/loop
on the remote server (Ubuntu 16.04) and I could not reproduce the error mentioned above. I am able to train and get the output for different PIDS (meaning that its working without any issue for me).
Did u tried setting up your environment in venv
instead of conda
.
@Yuanzhuo-Liu Can u please mention what command you ran to get that result?
Hi @Yuanzhuo-Liu @Leoluo0320, I ran the command
python3.7 examples/rllib.py scenarios/loop
on the remote server (Ubuntu 16.04) and I could not reproduce the error mentioned above. I am able to train and get the output for different PIDS (meaning that its working without any issue for me). Did u tried setting up your environment invenv
instead ofconda
.@Yuanzhuo-Liu Can u please mention what command you ran to get that result?
Thanks! I think the reasons why you did not repeat my mistake are as follows:
So, I recommend that you use an Ubuntu18.04 with a smaller CPU memory to run a program with a longer episode.
@JenishPatel99 @christianjans can u please have a look at the ultra/rllib_train.py
memory Issue?
Hi, @RutvikGupta, I tried the same command "python examples/rllib.py scenarios/loop" but instead of using the policy "PG" I changed it to "PPO“. It finished the training process successfully and does not have the memory issue. I wonder if it will give you some ideas about the problem.
Hi @Yuanzhuo-Liu @Leoluo0320, I ran the command
python3.7 examples/rllib.py scenarios/loop
on the remote server (Ubuntu 16.04) and I could not reproduce the error mentioned above. I am able to train and get the output for different PIDS (meaning that its working without any issue for me). Did u tried setting up your environment invenv
instead ofconda
.@Yuanzhuo-Liu Can u please mention what command you ran to get that result?
Hi, @RutvikGupta , I tried Venv and still have this problem. I really don't know what to do, this memory problem has been bothering me for a long time. Thanks
Hi!@RutvikGupta @JenishPatel99 @christianjans . I'm sorry to trouble you all the time. Is there any progress on my problem? Thanks!
Hi!@RutvikGupta @JenishPatel99 @christianjans . I'm sorry to trouble you all the time. Is there any progress on my problem? Thanks!
HI, @Yuanzhuo-Liu I haven't been able to reproduce the problem on the server but I will try to do it on my local machine.
- Your example episode has a shorter length.The same error occurred when I run the 'python ultra/rllib_train.py --task 1 --level easy --policy ppo' and '/home/meta/SMARTS/baselines/marl_benchmark/agents/ppo/baseline-lane-control.yaml'.The common feature is that the default episode is very long.For example, when I run the 'python ultra/rllib_train.py --task 1 --level easy --policy ppo', the episode fills up at 500K of memory
Hi @Yuanzhuo-Liu, sorry for the late reply regarding the ULTRA problem. Can I ask what you mean by "longer episode"? Do you mean a larger number for the agent's max_episode_steps
? If you wanted to shorten the episode length, you can specify the --max-episode-steps
argument when running ultra/rllib_train.py
.
Also, just for clarification, what do you mean by "the episode fills up at 500k of memory"?
Finally, just to confirm, you are using a machine with 32 GB of memory to run these experiments?
- Your example episode has a shorter length.The same error occurred when I run the 'python ultra/rllib_train.py --task 1 --level easy --policy ppo' and '/home/meta/SMARTS/baselines/marl_benchmark/agents/ppo/baseline-lane-control.yaml'.The common feature is that the default episode is very long.For example, when I run the 'python ultra/rllib_train.py --task 1 --level easy --policy ppo', the episode fills up at 500K of memory
Hi @Yuanzhuo-Liu, sorry for the late reply regarding the ULTRA problem. Can I ask what you mean by "longer episode"? Do you mean a larger number for the agent's
max_episode_steps
? If you wanted to shorten the episode length, you can specify the--max-episode-steps
argument when runningultra/rllib_train.py
.Also, just for clarification, what do you mean by "the episode fills up at 500k of memory"?
Finally, just to confirm, you are using a machine with 32 GB of memory to run these experiments?
Hi @christianjans , Thank you very much for your reply.
"longer episode" means "--episodes".
This type of error occurs(ray.memory_monitor.RayOutOfMemoryError) when the program runs to around 550K, and this program cannot continue to run.
Yes, I'm running this experiment on a 32GB RAM machine.
Hi, @RutvikGupta @christianjans .I am very sorry for missing some information. I updated some configuration in rllib_train.py.
I wonder if you repeat my mistake? Thansks
xref Issues #870 and #557 and #855
(just linking the issues for future reference)
BUG REPORT
High Level Description Hi! Why does the latest version still have this bug?
SMARTS version [0.4.17]
Error logs and screenshots