wayveai / mile

PyTorch code for the paper "Model-Based Imitation Learning for Urban Driving".
MIT License
330 stars 31 forks source link

CARLA Simulator Timeouts for Data Collection #44

Open jorchiu opened 1 month ago

jorchiu commented 1 month ago

I'm running the Data Collection scenario, and I've set the WANDB syncing to offline. The commands I ran to start data collection:

export WANDB_MODE=offline
bash run/data_collect.sh /home/user1/carla_0_9_11/CarlaUE4.sh /home/user1/mile/dataset_collect 2000 lb_data

However, I receive the following Runtime Error regarding the simulator timing out

RuntimeError: time-out of 60000ms while waiting for the simulator, make sure the simulator is ready and connected to localhost:2000
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

wandb: Waiting for W&B process to finish... (failed 1).
wandb: You can sync this run to the cloud by running:
wandb: wandb sync /home/user1/mile/outputs/2024-05-01/18-16-17/wandb/offline-run-20240501_181626-213560f2
wandb: Find logs at: ./wandb/offline-run-20240501_181626-213560f2/logs
PYTHON_RETURN=1!!! Start Over!!!

Does the data collection scenario require me to run it with a GUI? I've SSH'd into my machine, so it's running the scenario in headless mode.

anthonyhu commented 1 month ago

Hello,

You do no need to run CARLA with a GUI, I've also collected the data by ssh'ing into my machine and ran the script from there. I've also encountered that error in the past, here are some possible solutions:

jorchiu commented 1 month ago

Hi Anthony,

My port 2000 is free and unused. I tried to use a different port but I still receive the same error after multiple tries.

Not too sure if this matters, but does wandb syncing need to be set to online? I've set it to offline.

anthonyhu commented 1 month ago

I haven't tried with wandb offline myself but it should work in theory. Could you try setting it to online and see if it solves your issue?

DianeHadley commented 1 week ago

@anthonyhu I've picked up the same project as Jordan. Same error online:

(mile) dihadley3-adas-0614@dihadley3-adas-0614:~/mile$ bash run/data_collect.sh $CARLA_ROOT/CarlaUE4.sh /home/dihadley3-adas-0614/mile_data/ 2000 lb_data
rm: cannot remove 'outputs/port_2000_checkpoint.txt': No such file or directory
rm: cannot remove 'outputs/port_2000_ep_stat_buffer_*.json': No such file or directory
[2024-06-14 19:46:51,891][utils.server_utils][INFO] - Killed Carla Servers on port 2000!
[2024-06-14 19:46:52,902][utils.server_utils][INFO] - Killed Carla Servers on port 2000!
[2024-06-14 19:46:53,907][utils.server_utils][INFO] - CUDA_VISIBLE_DEVICES=None bash /home/dihadley3-adas-0614/carla_0_9_11/CarlaUE4.sh -fps=25 -quality-level=Epic -carla-rpc-port=2000
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 2
wandb: You chose 'Use an existing W&B account'
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:
wandb: Appending key for api.wandb.ai to your netrc file: /home/dihadley3-adas-0614/.netrc
[2024-06-14 19:47:21,435][mile.agents.rl_birdview.rl_birdview_agent][INFO] - Resume checkpoint latest ckpt/ckpt_11833344.pth
[2024-06-14 19:47:22,191][mile.agents.rl_birdview.rl_birdview_agent][INFO] - Loading wandb checkpoint: ckpt/ckpt_11833344.pth
[2024-06-14 19:47:22,389][__main__][INFO] - Start from env_idx: 0, task_idx 0
wandb: Currently logged in as: dianehadley27 (diane-hadley). Use `wandb login --relogin` to force relogin
wandb: wandb version 0.17.1 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.13.9
wandb: Run data is saved locally in /home/dihadley3-adas-0614/mile/outputs/2024-06-14/19-46-51/wandb/run-20240614_194722-9sgwl85s
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run mile_data
wandb: ⭐️ View project at https://wandb.ai/diane-hadley/mile
wandb: 🚀 View run at https://wandb.ai/diane-hadley/mile/runs/9sgwl85s
Error executing job with overrides: ['carla_sh_path=/home/dihadley3-adas-0614/carla_0_9_11/CarlaUE4.sh', 'dataset_root=/home/dihadley3-adas-0614/mile_data/', 'port=2000', 'test_suites=lb_data']
Traceback (most recent call last):
  File "data_collect.py", line 175, in main
    env = gym.make(env_setup['env_id'], obs_configs=obs_configs, reward_configs=reward_configs,
  File "/home/dihadley3-adas-0614/miniconda3/envs/mile/lib/python3.8/site-packages/gym/envs/registration.py", line 145, in make
    return registry.make(id, **kwargs)
  File "/home/dihadley3-adas-0614/miniconda3/envs/mile/lib/python3.8/site-packages/gym/envs/registration.py", line 90, in make
    env = spec.make(**kwargs)
  File "/home/dihadley3-adas-0614/miniconda3/envs/mile/lib/python3.8/site-packages/gym/envs/registration.py", line 60, in make
    env = cls(**_kwargs)
  File "/home/dihadley3-adas-0614/mile/carla_gym/envs/suites/endless_env.py", line 10, in __init__
    super().__init__(carla_map, host, port, seed, no_rendering,
  File "/home/dihadley3-adas-0614/mile/carla_gym/carla_multi_agent_env.py", line 31, in __init__
    self._init_client(carla_map, host, port, seed=seed, no_rendering=no_rendering)
  File "/home/dihadley3-adas-0614/mile/carla_gym/carla_multi_agent_env.py", line 155, in _init_client
    self._world = client.load_world(carla_map)
RuntimeError: time-out of 60000ms while waiting for the simulator, make sure the simulator is ready and connected to localhost:2000

But I think the error may be with the CARLA package install:

(mile) dihadley3-adas-0614@dihadley3-adas-0614:~/mile$ easy_install /home/dihadley3-adas-0614/carla_0_9_11/PythonAPI/carla/dist/carla-0.9.11-py3.7-linux-x86_64.egg
WARNING: The easy_install command is deprecated and will be removed in a future version.
Processing carla-0.9.11-py3.7-linux-x86_64.egg
creating /home/dihadley3-adas-0614/miniconda3/envs/mile/lib/python3.8/site-packages/carla-0.9.11-py3.7-linux-x86_64.egg
Extracting carla-0.9.11-py3.7-linux-x86_64.egg to /home/dihadley3-adas-0614/miniconda3/envs/mile/lib/python3.8/site-packages
Adding carla 0.9.11 to easy-install.pth file

Installed /home/dihadley3-adas-0614/miniconda3/envs/mile/lib/python3.8/site-packages/carla-0.9.11-py3.7-linux-x86_64.egg
Processing dependencies for carla==0.9.11
Searching for carla==0.9.11
Reading https://pypi.org/simple/carla/
No local packages or working download links found for carla==0.9.11
error: Could not find suitable distribution for Requirement.parse('carla==0.9.11')

This Carla 0.9.11 package depends on Python 3.7 but this repo uses Python 3.8. Should I be using a different CARLA egg file for 3.8?

pip list in the conda environment shows that carla is installed, but based on waiting for the simulator, make sure the simulator is ready and connected to localhost:2000, the simulator seems to not be running properly

anthonyhu commented 5 days ago

Hello Diane,

The carla package error is more of a pip warning. It is trying to find the 0.9.11 version on pip to find out dependencies on this package but does not find it. The carla package itself however should be correctly installed.

Have you tried running the CARLA simulator by itself? Can you see the CARLA graphical interface and move in the environment?

jorchiu commented 3 days ago

Hi @anthonyhu,

I was able to run the data collection scenario. The solution was switching to a machine that has a dedicated NVIDIA GPU with at least 6gb VRAM to be able to run the CARLA simulator.

Do you have any recommendations on which test suite to use that doesn't take too long to collect the data where I can still run the training scenario afterwards?

Initially, I ran the data collection scenario with the lb_data test suite but it takes a while.

anthonyhu commented 3 days ago

Great to see you made it work. If collecting data takes a while, you can limit yourself to Town01 for experimentation (https://github.com/wayveai/mile/blob/main/config/test_suites/lb_town01.yaml), and only evaluate on Town02 (by removing Town05 in the lb_test.yaml, which is a smaller version of Town01 with a different layout).