mit-han-lab / bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
https://bevfusion.mit.edu
Apache License 2.0
2.27k stars 410 forks source link

Torchpack test runs indefinitely #142

Closed ssuralcmu closed 2 years ago

ssuralcmu commented 2 years ago

On running the command from the README file-

torchpack dist-run -np 8 python tools/test.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml pretrained/bevfusion-det.pth --eval bbox

The program runs indefinitely and does not produce output. Running with -np 1 does not make a difference. I have two NVIDIA GeForce RTX 2080 Ti GPUs.

Everything else before this step seems to have been installed without issues.

Any idea what might be going wrong?

kentang-mit commented 2 years ago

Hello @ssuralcmu,

Would you mind adding a -v flag to your command like that?

torchpack dist-run -np 8 -v python tools/test.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml pretrained/bevfusion-det.pth --eval bbox

Best, Haotian

ssuralcmu commented 2 years ago

Hi @kentang-mit,

Thanks for your response.

Adding -v prints the following-

mpirun --allow-run-as-root -np 8 -H localhost:8 -bind-to none -map-by slot -x CLUTTER_IM_MODULE -x COMPIZ_BIN_PATH -x COMPIZ_CONFIG_PROFILE -x DBUS_SESSION_BUS_ADDRESS -x DEFAULTS_PATH -x DESKTOP_SESSION -x DISPLAY -x GDMSESSION -x GDM_LANG -x GNOME_DESKTOP_SESSION_ID -x GNOME_KEYRING_CONTROL -x GNOME_KEYRING_PID -x GPG_AGENT_INFO -x GTK2_MODULES -x GTK_IM_MODULE -x GTK_MODULES -x HOME -x IM_CONFIG_PHASE -x INSTANCE -x JOB -x LANG -x LANGUAGE -x LC_ALL -x LD_LIBRARY_PATH -x LESSCLOSE -x LESSOPEN -x LOGNAME -x LS_COLORS -x MANDATORY_PATH -x MASTER_HOST -x PATH -x PWD -x QT4_IM_MODULE -x QT_ACCESSIBILITY -x QT_IM_MODULE -x QT_LINUX_ACCESSIBILITY_ALWAYS_ON -x QT_QPA_PLATFORMTHEME -x SESSION -x SESSIONTYPE -x SESSION_MANAGER -x SHELL -x SHLVL -x SSH_AUTH_SOCK -x TERM -x UE4_ROOT -x UNITY_DEFAULT_PROFILE -x UNITY_HAS_3D_SUPPORT -x UPSTART_EVENTS -x UPSTART_INSTANCE -x UPSTART_JOB -x UPSTART_SESSION -x USER -x VTE_VERSION -x WINDOWID -x XAUTHORITY -x XDG_CONFIG_DIRS -x XDG_CURRENT_DESKTOP -x XDG_DATA_DIRS -x XDG_GREETER_DATA_DIR -x XDG_MENU_PREFIX -x XDG_RUNTIME_DIR -x XDG_SEAT -x XDG_SEAT_PATH -x XDG_SESSION_DESKTOP -x XDG_SESSION_ID -x XDG_SESSION_PATH -x XDG_SESSION_TYPE -x XDGVTNR -x XMODIFIERS -x -mca pml ob1 -mca btl ^openib -mca btl_tcp_if_exclude docker0,lo python3.8 tools/test.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml pretrained/bevfusion-det.pth --eval bbox

kentang-mit commented 2 years ago

I see, so you are still not able to see any information other than that, right? In this case, may I ask what is your OpenMPI version? We would recommend installing version 4.0.4 as well as mpi4py 3.0.3.

ssuralcmu commented 2 years ago

No, nothing else gets printed. I do have OpenMPI 4.0.4 and mpi4py 3.0.3.

kentang-mit commented 2 years ago

I see, probably @zhijian-liu can have more insights on this situation since he is the author of torchpack.

zhijian-liu commented 2 years ago

This seems very wield. Could you please do torchpack dist-run -np 2 hostname to see whether there is any output?

kentang-mit commented 2 years ago

You can also have a look at these two PRs: https://github.com/mit-han-lab/bevfusion/pull/144 and https://github.com/mit-han-lab/bevfusion/pull/145 that provide docker setups. There is a step-by-step environment setup in this file. You can check whether your setup matches mine.

Ilyabasharov commented 2 years ago

@ssuralcmu i've faced this issue. Solved by docker instructions

apt-get install openmpi-bin openmpi-common libopenmpi-dev
pip install torchpack mpi4py==3.0.3 numba==0.48.0 --force-reinstall

Before this instruction i've downloaded OpenMPI directly from here. It causes this issue.

ssuralcmu commented 2 years ago

Hi, using the docker instructions and an np value of 2 has fixed the issue. I also had to pull the latest version of the repository modified a couple of days back. I can now obtain the evaluation results as intended. Thanks a lot for your help!