Question about RL training on waymo

xuqingyao commented 5 months ago

Hello, I'm trying to train an RL policy on waymo by simply run python scenarionet_training/scripts/train_waymo.py --num-gpus 1 script. However, I found that the program seemed keep pending after initialization. The output is as follows. May I konw what should the normal output should look like?

Successfully initialize Ray!
Available resources:  {'memory': 132170424525.0, 'accelerator_type:G': 1.0, 'GPU': 1.0, 'object_store_memory': 60930181939.0, 'CPU': 88.0, 'node:192.168.28.129': 1.0}
== Status ==
Current time: 2024-01-26 20:49:10 (running for 00:00:00.65)
Memory usage on this node: 299.9/472.3 GiB
Using FIFO scheduling algorithm.
Resources requested: 6.999999999999997/88 CPUs, 0.5/1 GPUs, 0.0/123.09 GiB heap, 0.0/56.75 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /DB/data/qingyaoxu/scenarionet/experiment/TEST
Number of trials: 5/5 (4 PENDING, 1 RUNNING)
+------------------------------------------+----------+-------+--------+
| Trial name                               | status   | loc   |   seed |
|------------------------------------------+----------+-------+--------|
| MultiWorkerPPO_GymEnvWrapper_4c76d_00000 | RUNNING  |       |      0 |
| MultiWorkerPPO_GymEnvWrapper_4c76d_00001 | PENDING  |       |    100 |
| MultiWorkerPPO_GymEnvWrapper_4c76d_00002 | PENDING  |       |    200 |
| MultiWorkerPPO_GymEnvWrapper_4c76d_00003 | PENDING  |       |    300 |
| MultiWorkerPPO_GymEnvWrapper_4c76d_00004 | PENDING  |       |    400 |
+------------------------------------------+----------+-------+--------+

QuanyiLi commented 5 months ago

It is almost correct. You can set the num_gpus as 0 to disable using GPU or any number that is less than 0.2. Then all experiments can run parallel.

QuanyiLi commented 5 months ago

It is a feature of ray. You can take a look at the documentation of ray1.2 (it is an outdated version, lol). In my experience, I usually set num_gpus=0 as CPUs are enough to handle the optimization of these MLPs. If GPU is used, moving data between devices is costly, while the acceleration brought by GPU is not obvious enough.

xuqingyao commented 4 months ago

Thank you for your advice. I did successfully run this code when I set the num_gpus as 0. However, I encountered another problem. I followed python -m scenarionet.convert_waymo -d /path/to/your/database --raw_data_path ./waymo/training_20s --num_files=1000 to build the data, but I used the v1.1 version of waymo. And I found that the crosswalk data of the 126 scene(id bec43944a9017106) is a 3d point but not the required 2d points when I train the RL policy, Is it a problem cause by the original data? In order to solve this problem, my current approach is to directly intercept the first two dimensions as data input. Is this correct? Will it cause some problem?

  File "/DB/data/qingyaoxu/metadrive/metadrive/envs/base_env.py", line 522, in reset
    self.engine.reset()
  File "/DB/data/qingyaoxu/metadrive/metadrive/engine/base_engine.py", line 354, in reset
    manager.reset()
  File "/DB/data/qingyaoxu/metadrive/metadrive/manager/scenario_map_manager.py", line 41, in reset
    new_map = ScenarioMap(map_index=seed, map_data=m_data)
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/map/scenario_map.py", line 19, in __init__
    super(ScenarioMap, self).__init__(dict(id=self.map_index), random_seed=random_seed)
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/map/base_map.py", line 63, in __init__
    self._generate()
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/map/scenario_map.py", line 36, in _generate
    block.construct_block(self.engine.worldNP, self.engine.physics_world, attach_to_world=True)
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/block/base_block.py", line 124, in construct_block
    self._create_in_world()
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/block/base_block.py", line 226, in _create_in_world
    self.create_in_world()
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/scenario_block/scenario_block.py", line 72, in create_in_world
    self._construct_crosswalk()
  File "/DB/data/qingyaoxu/metadrive/metadrive/component/block/base_block.py", line 411, in _construct_crosswalk
    np = make_polygon_model(polygon, 1.5)
  File "/DB/data/qingyaoxu/metadrive/metadrive/utils/vertex.py", line 108, in make_polygon_model
    elif not is_anticlockwise(points) and auto_anticlockwise:
  File "/DB/data/qingyaoxu/metadrive/metadrive/utils/vertex.py", line 75, in is_anticlockwise
    x1, y1 = points[i]
ValueError: too many values to unpack (expected 2)

QuanyiLi commented 4 months ago

Could you try pulling the latest MetaDrive and running your script again?

metadriverse / scenarionet

Question about RL training on waymo #53