opendilab / InterFuser

[CoRL 2022] InterFuser: Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer
Apache License 2.0
545 stars 46 forks source link

would you please provide the Dockerfile? #23

Closed xanhug closed 1 year ago

xanhug commented 1 year ago

sorry for bother again and thanks to your help My tutor has asked me to reproduce your experimental result,and i want finish this task by submit interfuser agent to carla leaderboard,but the leaderboard_evaluator.pyseems not work normally in docker,I'm not sure if it's due to I'm not able to configure the "Dockerfile.master" file,and here is the log

(base) hxa@hxa-Nitro-AN515-58:~/InterFuser$ docker run -it --net=host --gpus all leaderboard-user /bin/bash
root@hxa-Nitro-AN515-58:/workspace# ./leaderboard/scripts/run_evaluation.sh
WARNING: Version mismatch detected: You are trying to connect to a simulator that might be incompatible with this API 
WARNING: Client API version     = 784d9b9f 
WARNING: Simulator API version  = 0.9.10.1 
Traceback (most recent call last):
  File "leaderboard/leaderboard/leaderboard_evaluator.py", line 480, in main
    leaderboard_evaluator = LeaderboardEvaluator(arguments, statistics_manager)
  File "leaderboard/leaderboard/leaderboard_evaluator.py", line 98, in __init__
    self.module_agent = importlib.import_module(module_name)    #import导入所需的模型agent文件
  File "/opt/conda/envs/python37/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "leaderboard/team_code/interfuser_agent.py", line 7, in <module>
    import cv2
ModuleNotFoundError: No module named 'cv2'
Exception ignored in: <function LeaderboardEvaluator.__del__ at 0x7fc30a2e6680>
Traceback (most recent call last):
  File "leaderboard/leaderboard/leaderboard_evaluator.py", line 125, in __del__
    self._cleanup()
  File "leaderboard/leaderboard/leaderboard_evaluator.py", line 137, in _cleanup
    if self.manager and self.manager.get_running_status() \
AttributeError: 'LeaderboardEvaluator' object has no attribute 'manager'
Traceback (most recent call last):
  File "leaderboard/leaderboard/leaderboard_evaluator.py", line 490, in <module>
    main()
  File "leaderboard/leaderboard/leaderboard_evaluator.py", line 486, in main
    del leaderboard_evaluator
UnboundLocalError: local variable 'leaderboard_evaluator' referenced before assignment

once i run the run_evaluation.sh in terminal ,it seems working normally like that

(interfuser) hxa@hxa-Nitro-AN515-58:~/InterFuser$ ./leaderboard/scripts/run_evaluation.sh
WARNING: Version mismatch detected: You are trying to connect to a simulator that might be incompatible with this API 
WARNING: Client API version     = 784d9b9f 
WARNING: Simulator API version  = 0.9.10.1 
leaderboard/leaderboard/leaderboard_evaluator.py:92: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(dist.version) < LooseVersion('0.9.10'):
pygame 2.1.2 (SDL 2.0.16, Python 3.7.16)
Hello from the pygame community. https://www.pygame.org/contribute.html

========= Preparing RouteScenario_16 (repetition 0) =========
> Setting up the agent
load model: leaderboard/team_code/interfuser.pth.tar
routes_town05_long_03_02_17_25_56
> Loading the world
load
load
load
load
load
load
load
Skipping scenario 'Scenario4' due to setup error: list index out of range
> Running the route

2023-03-02 17-30-06 的屏幕截图

thanks for your help.

deepcs233 commented 1 year ago

Hi! It looks like that the Python environemt in Docker lacks cv2 package. You may check the file Dockerfile.master and add the command that install relative packages into the file. One solution is to install packages and fix other bugs in your docker until it can run normally. At the same time, record the operations and copy them into the Dockerfile.

For making a docker, i spent a lot time on it which needed much thing to do. I upload my old Dockerfile.master file to leaderboard/scripts/Dockerfile_example.master and hope it can help you (i can not make sure that it can build a usable docker now).

By the way, using the code and model directly provided by the repo may not get similar result in the leaderboard. The model is only a demo for the showcase.

xanhug commented 1 year ago

@deepcs233 Many thanks for your kind help

hz3014 commented 1 year ago

Hi, @deepcs233 I was wondering do we have to use CUDA 9.0 provided in dockerfile in order to submit to online leaderboard? Since CUDA 9.0 is so outdated and we will have to downgrade pytorch version a lot compared to your requirement.

deepcs233 commented 1 year ago

Sorry, i just tried to use the default CUDA version provides in the dockfile. But i only converted the ckpt format for a lower version PyTorch when submitted the model to the leaderboard, the other code worked well. I don't know the driver version of the test server, it may support a higher CUDA version.

hz3014 commented 1 year ago

Sorry, i just tried to use the default CUDA version provides in the dockfile. But i only converted the ckpt format for a lower version PyTorch when submitted the model to the leaderboard, the other code worked well. I don't know the driver version of the test server, it may support a higher CUDA version.

Hi @deepcs233 After some modification, i was able to use docker to evaluate with local carla simulator. I got few questions about online submission: Do i need to modify ./leaderboard/scripts/run_evaluation.sh file for online submission? Currently i am using mostly your code but changed some paths. Do i need to modify export ROUTES=leaderboard/data/training_routes/routes_town05_long.xml? Does the online server automatically change the evaluation route or should we use the default setting provided by the original leaderboard repo?

deepcs233 commented 1 year ago

Hi

  1. Probably not required
  2. Not required
  3. In my experience, the online server will automatically change the evaluation route and other environment path.
  4. You need to make sure in your local docker: after add some needed environment variables(TEAM_CODE, CARLA_ROOT, ROUTES...), it can run the evaluation normally.
hz3014 commented 1 year ago

Hi

  1. Probably not required
  2. Not required
  3. In my experience, the online server will automatically change the evaluation route and other environment path.
  4. You need to make sure in your local docker: after add some needed environment variables(TEAM_CODE, CARLA_ROOT, ROUTES...), it can run the evaluation normally.

Dear @deepcs233 , I tried to retrain your model and submit to online leaderboard. I collected around 1.5TB of data. However, my evaluation driving score is only around 30%. Route complement 68.208 and infraction rate 0.479.

This score is way too low compared to your reported result. Any hints what might have gone wrong?

deepcs233 commented 1 year ago

I'm sorry, that didn't occur for me. But i have some suggestions:

  1. The online leaderboard does not look like it will consider 'stop sign', and you may modify your training code.
  2. The controller hyper-parameters could be adjusted, depending on your specific model.
  3. Have you collected night data, and have you removed frames that were not moving for a long time?
hz3014 commented 1 year ago

I'm sorry, that didn't occur for me. But i have some suggestions:

  1. The online leaderboard does not look like it will consider 'stop sign', and you may modify your training code.
  2. The controller hyper-parameters could be adjusted, depending on your specific model.
  3. Have you collected night data, and have you removed frames that were not moving for a long time?

Dear @deepcs233 Thank you for your reply. A1. Are you suggesting that we should remove slow down actions for 'stop sign' during online leaderboard evaluation?

I have also noticed during local evaluation, when approaching "stop sign", trajectory prediction is very short(almost none) and agent is not able to move any more. Did you meet this issue before? I guess something wrong happened during data collection for "stop sign" scenario.

Speaking of data collection, I noticed many collected data contains frames that ego-vehicle is blocked for a long time. I tried to recollect the same scenario, but they are also blocked, seems nothing much changed. Is there anyway to overcome this issue? I guess simple removing them would cause agent not able to learn such scenario?

I will look into hyper-parameter adjustment and I have collected night data. By the way, my local CARLA 42 routes benchmark has better result which is around 80% driving score, despite still 10% lower than Interfuser reported value. But the difference is a lot smaller than online leaderboard.

Looking forward for your reply : )

deepcs233 commented 1 year ago

Hi! Sorry for the late response, i haven't notice it because it's a closed isse:)

I have also noticed during local evaluation, when approaching "stop sign", trajectory prediction is very short(almost none) and agent is not able to move any more. Did you meet this issue before? I guess something wrong happened during data collection for "stop sign" scenario.

I haven't noticed this situation. The predicted trajectoryin our project is the future route that the ego-car will drive along, like a navigation route. So i think the collected GT here may not be very short?

Speaking of data collection, I noticed many collected data contains frames that ego-vehicle is blocked for a long time. I tried to recollect the same scenario, but they are also blocked, seems nothing much changed. Is there anyway to overcome this issue? I guess simple removing them would cause agent not able to learn such scenario?

There is no need to re-collect the scenarios. We can delete the blocked frames and re-arange them. The corrsponing code can be found in tools/data.