mit-han-lab / spvnas

[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
http://spvnas.mit.edu/
MIT License
587 stars 109 forks source link

Errors while Evaluating the Pretrained Model #6

Closed HenryJunW closed 4 years ago

HenryJunW commented 4 years ago

Thanks for your wonderful work.

I got errors while evaluating using 'torchpack dist-run -np 1 python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs',

/bin/sh: -c: line 0: syntax error near unexpected token (' /bin/sh: -c: line 0:mpirun --allow-run-as-root -np 1 -H localhost:1 -bind-to none -map-by slot -x BASH_FUNC_module() -x BLOCKSIZE -x CC -x CLICOLOR -x CONDA_DEFAULT_ENV -x CONDA_EXE -x CONDA_PREFIX -x CONDA_PROMPT_MODIFIER -x CONDA_PYTHON_EXE -x CONDA_SHLVL -x CPATH -x CPLUS_INCLUDE_PATH -x CPPFLAGS -x CUDA_HOME -x CUDA_VISIBLE_DEVICES -x CXX -x EDITOR -x GPU_DEVICE_ORDINAL -x HISTCONTROL -x HISTSIZE -x HOME -x HOSTNAME -x KDEDIRS -x LANG -x LDFLAGS -x LD_LIBRARY_PATH -x LD_RUN_PATH -x LESSOPEN -x LIBRARY_PATH -x LOADEDMODULES -x LOGNAME -x LSCOLORS -x LS_COLORS -x MAIL -x MANPATH -x MASTER_HOST -x MODULEPATH -x MODULESHOME -x OLDPWD -x PATH -x PKG_CONFIG_PATH -x PPL_PATH -x PROJECT_HOME -x PS1 -x PWD -x PYTHONPATH -x QTDIR -x QTINC -x QTLIB -x QT_GRAPHICSSYSTEM_CHECKED -x QT_PLUGIN_PATH -x SELINUX_LEVEL_REQUESTED -x SELINUX_ROLE_REQUESTED -x SELINUX_USE_CURRENT_RANGE -x SHELL -x SHLVL -x SLURMD_NODENAME -x SLURM_CLUSTER_NAME -x SLURM_CPUS_ON_NODE -x SLURM_DISTRIBUTION -x SLURM_GTIDS -x SLURM_JOBID -x SLURM_JOB_ACCOUNT -x SLURM_JOB_CPUS_PER_NODE -x SLURM_JOB_GID -x SLURM_JOB_ID -x SLURM_JOB_NAME -x SLURM_JOB_NODELIST -x SLURM_JOB_NUM_NODES -x SLURM_JOB_PARTITION -x SLURM_JOB_QOS -x SLURM_JOB_UID -x SLURM_JOB_USER -x SLURM_LAUNCH_NODE_IPADDR -x SLURM_LOCALID -x SLURM_NNODES -x SLURM_NODEID -x SLURM_NODELIST -x SLURM_NPROCS -x SLURM_NTASKS -x SLURM_PRIO_PROCESS -x SLURM_PROCID -x SLURM_PTY_PORT -x SLURM_PTY_WIN_COL -x SLURM_PTY_WIN_ROW -x SLURM_SRUN_COMM_HOST -x SLURM_SRUN_COMM_PORT -x SLURM_STEPID -x SLURM_STEP_GPUS -x SLURM_STEP_ID -x SLURM_STEP_LAUNCHER_PORT -x SLURM_STEP_NODELIST -x SLURM_STEP_NUM_NODES -x SLURM_STEP_NUM_TASKS -x SLURM_STEP_TASKS_PER_NODE -x SLURM_SUBMIT_DIR -x SLURM_SUBMIT_HOST -x SLURM_TASKS_PER_NODE -x SLURM_TASK_PID -x SLURM_TOPOLOGY_ADDR -x SLURM_TOPOLOGY_ADDR_PATTERN -x SLURM_UMASK -x SLURM_WORKING_CLUSTER -x SRUN_DEBUG -x SSH_ASKPASS -x SSH_CLIENT -x SSH_CONNECTION -x SSH_TTY -x TERM -x TMPDIR -x TMUX -x TMUX_PANE -x USER -x WORKON_HOME -x XDG_DATA_DIRS -x XDG_RUNTIME_DIR -x XDG_SESSIONID -x -x _CE_CONDA -x _CE_M -x LMFILES -mca pml ob1 -mca btl ^openib -mca btl_tcp_if_exclude docker0,lo python -m torchpack.launch.assets.silentrun python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs'

I checked I got the torchpack and torchsparse installed successfully. Any idea how to solve the issue? Thank you in advance.

kentang-mit commented 4 years ago

Hi,

Thanks for your interest in our work!

I didn't experience with the same problem when evaluating the pretrained model on my machine so I can only try to find the problem together with you through discussions. Specifically, have you also installed the OpenMPI library successfully on your machine? Also, please try to run

torchpack dist-run -np 1 -v python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs

to see what command is printed, thanks!

Best, Haotian

HenryJunW commented 4 years ago

Thanks for your instant reply. Previously I installed mpich using conda, and I just removed it and installed OpenMPI instead. Unfortunately, got the similar results even adding -v as following. Do you think there is anything related with the first four claims of another issue, https://github.com/mit-han-lab/e3d/issues/2#issue-698773318? Also, I tried to add the '-H hostname:slots', doesn't help much. Thanks.

mpirun --allow-run-as-root -np 1 -H localhost:1 -bind-to none -map-by slot -x BASH_FUNC_module() -x BLOCKSIZE -x CC -x CLICOLOR -x CONDA_DEFAULT_ENV -x CONDA_EXE -x CONDA_PREFIX -x CONDA_PROMPT_MODIFIER -x CONDA_PYTHON_EXE -x CONDA_SHLVL -x CPATH -x CPLUS_INCLUDE_PATH -x CPPFLAGS -x CUDA_HOME -x CUDA_VISIBLE_DEVICES -x CXX -x EDITOR -x GPU_DEVICE_ORDINAL -x HISTCONTROL -x HISTSIZE -x HOME -x HOSTNAME -x KDEDIRS -x LANG -x LDFLAGS -x LD_LIBRARY_PATH -x LD_RUN_PATH -x LESSOPEN -x LIBRARY_PATH -x LOADEDMODULES -x LOGNAME -x LSCOLORS -x LS_COLORS -x MAIL -x MANPATH -x MASTER_HOST -x MODULEPATH -x MODULESHOME -x OLDPWD -x PATH -x PKG_CONFIG_PATH -x PPL_PATH -x PROJECT_HOME -x PS1 -x PWD -x PYTHONPATH -x QTDIR -x QTINC -x QTLIB -x QT_GRAPHICSSYSTEM_CHECKED -x QT_PLUGIN_PATH -x SELINUX_LEVEL_REQUESTED -x SELINUX_ROLE_REQUESTED -x SELINUX_USE_CURRENT_RANGE -x SHELL -x SHLVL -x SLURMD_NODENAME -x SLURM_CLUSTER_NAME -x SLURM_CPUS_ON_NODE -x SLURM_DISTRIBUTION -x SLURM_GTIDS -x SLURM_JOBID -x SLURM_JOB_ACCOUNT -x SLURM_JOB_CPUS_PER_NODE -x SLURM_JOB_GID -x SLURM_JOB_ID -x SLURM_JOB_NAME -x SLURM_JOB_NODELIST -x SLURM_JOB_NUM_NODES -x SLURM_JOB_PARTITION -x SLURM_JOB_QOS -x SLURM_JOB_UID -x SLURM_JOB_USER -x SLURM_LAUNCH_NODE_IPADDR -x SLURM_LOCALID -x SLURM_NNODES -x SLURM_NODEID -x SLURM_NODELIST -x SLURM_NPROCS -x SLURM_NTASKS -x SLURM_PRIO_PROCESS -x SLURM_PROCID -x SLURM_PTY_PORT -x SLURM_PTY_WIN_COL -x SLURM_PTY_WIN_ROW -x SLURM_SRUN_COMM_HOST -x SLURM_SRUN_COMM_PORT -x SLURM_STEPID -x SLURM_STEP_GPUS -x SLURM_STEP_ID -x SLURM_STEP_LAUNCHER_PORT -x SLURM_STEP_NODELIST -x SLURM_STEP_NUM_NODES -x SLURM_STEP_NUM_TASKS -x SLURM_STEP_TASKS_PER_NODE -x SLURM_SUBMIT_DIR -x SLURM_SUBMIT_HOST -x SLURM_TASKS_PER_NODE -x SLURM_TASK_PID -x SLURM_TOPOLOGY_ADDR -x SLURM_TOPOLOGY_ADDR_PATTERN -x SLURM_UMASK -x SLURM_WORKING_CLUSTER -x SRUN_DEBUG -x SSH_ASKPASS -x SSH_CLIENT -x SSH_CONNECTION -x SSH_TTY -x TERM -x TMPDIR -x TMUX -x TMUX_PANE -x USER -x WORKON_HOME -x XDG_DATA_DIRS -x XDG_RUNTIME_DIR -x XDG_SESSIONID -x -x _CE_CONDA -x _CE_M -x LMFILES -mca pml ob1 -mca btl ^openib -mca btl_tcp_if_exclude docker0,lo python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs /bin/sh: -c: line 0: syntax error near unexpected token (' /bin/sh: -c: line 0:mpirun --allow-run-as-root -np 1 -H localhost:1 -bind-to none -map-by slot -x BASH_FUNC_module() -x BLOCKSIZE -x CC -x CLICOLOR -x CONDA_DEFAULT_ENV -x CONDA_EXE -x CONDA_PREFIX -x CONDA_PROMPT_MODIFIER -x CONDA_PYTHON_EXE -x CONDA_SHLVL -x CPATH -x CPLUS_INCLUDE_PATH -x CPPFLAGS -x CUDA_HOME -x CUDA_VISIBLE_DEVICES -x CXX -x EDITOR -x GPU_DEVICE_ORDINAL -x HISTCONTROL -x HISTSIZE -x HOME -x HOSTNAME -x KDEDIRS -x LANG -x LDFLAGS -x LD_LIBRARY_PATH -x LD_RUN_PATH -x LESSOPEN -x LIBRARY_PATH -x LOADEDMODULES -x LOGNAME -x LSCOLORS -x LS_COLORS -x MAIL -x MANPATH -x MASTER_HOST -x MODULEPATH -x MODULESHOME -x OLDPWD -x PATH -x PKG_CONFIG_PATH -x PPL_PATH -x PROJECT_HOME -x PS1 -x PWD -x PYTHONPATH -x QTDIR -x QTINC -x QTLIB -x QT_GRAPHICSSYSTEM_CHECKED -x QT_PLUGIN_PATH -x SELINUX_LEVEL_REQUESTED -x SELINUX_ROLE_REQUESTED -x SELINUX_USE_CURRENT_RANGE -x SHELL -x SHLVL -x SLURMD_NODENAME -x SLURM_CLUSTER_NAME -x SLURM_CPUS_ON_NODE -x SLURM_DISTRIBUTION -x SLURM_GTIDS -x SLURM_JOBID -x SLURM_JOB_ACCOUNT -x SLURM_JOB_CPUS_PER_NODE -x SLURM_JOB_GID -x SLURM_JOB_ID -x SLURM_JOB_NAME -x SLURM_JOB_NODELIST -x SLURM_JOB_NUM_NODES -x SLURM_JOB_PARTITION -x SLURM_JOB_QOS -x SLURM_JOB_UID -x SLURM_JOB_USER -x SLURM_LAUNCH_NODE_IPADDR -x SLURM_LOCALID -x SLURM_NNODES -x SLURM_NODEID -x SLURM_NODELIST -x SLURM_NPROCS -x SLURM_NTASKS -x SLURM_PRIO_PROCESS -x SLURM_PROCID -x SLURM_PTY_PORT -x SLURM_PTY_WIN_COL -x SLURM_PTY_WIN_ROW -x SLURM_SRUN_COMM_HOST -x SLURM_SRUN_COMM_PORT -x SLURM_STEPID -x SLURM_STEP_GPUS -x SLURM_STEP_ID -x SLURM_STEP_LAUNCHER_PORT -x SLURM_STEP_NODELIST -x SLURM_STEP_NUM_NODES -x SLURM_STEP_NUM_TASKS -x SLURM_STEP_TASKS_PER_NODE -x SLURM_SUBMIT_DIR -x SLURM_SUBMIT_HOST -x SLURM_TASKS_PER_NODE -x SLURM_TASK_PID -x SLURM_TOPOLOGY_ADDR -x SLURM_TOPOLOGY_ADDR_PATTERN -x SLURM_UMASK -x SLURM_WORKING_CLUSTER -x SRUN_DEBUG -x SSH_ASKPASS -x SSH_CLIENT -x SSH_CONNECTION -x SSH_TTY -x TERM -x TMPDIR -x TMUX -x TMUX_PANE -x USER -x WORKON_HOME -x XDG_DATA_DIRS -x XDG_RUNTIME_DIR -x XDG_SESSIONID -x -x _CE_CONDA -x _CE_M -x LMFILES -mca pml ob1 -mca btl ^openib -mca btl_tcp_if_exclude docker0,lo python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs'

kentang-mit commented 4 years ago

Hi,

I've checked the command provided by you. It turns out that the BASH_FUNC_module() part of your command causes the problem. Temporarily, I suggest you to run

export MASTER_HOST=127.0.0.1:[Arbitrary Port Number]

and

mpirun --allow-run-as-root -np 1 -H localhost:1 -bind-to none -map-by slot -x "BASH_FUNC_module()" -x BLOCKSIZE -x CC -x CLICOLOR -x CONDA_DEFAULT_ENV -x CONDA_EXE -x CONDA_PREFIX -x CONDA_PROMPT_MODIFIER -x CONDA_PYTHON_EXE -x CONDA_SHLVL -x CPATH -x CPLUS_INCLUDE_PATH -x CPPFLAGS -x CUDA_HOME -x CUDA_VISIBLE_DEVICES -x CXX -x EDITOR -x GPU_DEVICE_ORDINAL -x HISTCONTROL -x HISTSIZE -x HOME -x HOSTNAME -x KDEDIRS -x LANG -x LDFLAGS -x LD_LIBRARY_PATH -x LD_RUN_PATH -x LESSOPEN -x LIBRARY_PATH -x LOADEDMODULES -x LOGNAME -x LSCOLORS -x LS_COLORS -x MAIL -x MANPATH -x MASTER_HOST -x MODULEPATH -x MODULESHOME -x OLDPWD -x PATH -x PKG_CONFIG_PATH -x PPL_PATH -x PROJECT_HOME -x PS1 -x PWD -x PYTHONPATH -x QTDIR -x QTINC -x QTLIB -x QT_GRAPHICSSYSTEM_CHECKED -x QT_PLUGIN_PATH -x SELINUX_LEVEL_REQUESTED -x SELINUX_ROLE_REQUESTED -x SELINUX_USE_CURRENT_RANGE -x SHELL -x SHLVL -x SLURMD_NODENAME -x SLURM_CLUSTER_NAME -x SLURM_CPUS_ON_NODE -x SLURM_DISTRIBUTION -x SLURM_GTIDS -x SLURM_JOBID -x SLURM_JOB_ACCOUNT -x SLURM_JOB_CPUS_PER_NODE -x SLURM_JOB_GID -x SLURM_JOB_ID -x SLURM_JOB_NAME -x SLURM_JOB_NODELIST -x SLURM_JOB_NUM_NODES -x SLURM_JOB_PARTITION -x SLURM_JOB_QOS -x SLURM_JOB_UID -x SLURM_JOB_USER -x SLURM_LAUNCH_NODE_IPADDR -x SLURM_LOCALID -x SLURM_NNODES -x SLURM_NODEID -x SLURM_NODELIST -x SLURM_NPROCS -x SLURM_NTASKS -x SLURM_PRIO_PROCESS -x SLURM_PROCID -x SLURM_PTY_PORT -x SLURM_PTY_WIN_COL -x SLURM_PTY_WIN_ROW -x SLURM_SRUN_COMM_HOST -x SLURM_SRUN_COMM_PORT -x SLURM_STEPID -x SLURM_STEP_GPUS -x SLURM_STEP_ID -x SLURM_STEP_LAUNCHER_PORT -x SLURM_STEP_NODELIST -x SLURM_STEP_NUM_NODES -x SLURM_STEP_NUM_TASKS -x SLURM_STEP_TASKS_PER_NODE -x SLURM_SUBMIT_DIR -x SLURM_SUBMIT_HOST -x SLURM_TASKS_PER_NODE -x SLURM_TASK_PID -x SLURM_TOPOLOGY_ADDR -x SLURM_TOPOLOGY_ADDR_PATTERN -x SLURM_UMASK -x SLURM_WORKING_CLUSTER -x SRUN_DEBUG -x SSH_ASKPASS -x SSH_CLIENT -x SSH_CONNECTION -x SSH_TTY -x TERM -x TMPDIR -x TMUX -x TMUX_PANE -x USER -x WORKON_HOME -x XDG_DATA_DIRS -x XDG_RUNTIME_DIR -x XDG_SESSION_ID -x _ -x _CE_CONDA -x _CE_M -x LMFILES -mca pml ob1 -mca btl ^openib -mca btl_tcp_if_exclude docker0,lo python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs

to try whether you can successfully run the inference. Note that I added a pair of "" around -x BASH_FUNC_module().

To point out, I also tried to run our code in an Anaconda environment, and it turns out I'm getting much shorter command comparing with yours. I'll sync with Zhijian (the author of torchpack and co-author of the SPVNAS project) on details related to mpirun arguments. Hope that temporary solution can work for you.

Best, Haotian

HenryJunW commented 4 years ago

It works and I can evaluate succeessfully following your instruction. You are right, perhaps it is because of the quote of the 'BASH_FUNC_module()'. Thanks for your effort and time, and I appreciate it very much.

zhijian-liu commented 4 years ago

Hi @HenryJunW, I've added a patch in the latest torchpack to fix this issue. Could you please install the latest version and have a try? Please let me know if you have any questions. Thanks!

HenryJunW commented 4 years ago

@zhijian-liu Works perfectly this time. Thanks.

HenryJunW commented 4 years ago

To evaluate the pre-trained model of SPVCNN / MinkUNet in the code https://github.com/mit-han-lab/e3d/blob/4c494f211f36e368d291d39088c8122b34d1d227/spvnas/evaluate.py#L67-L69 missing importing model from model_zoo.

'from model_zoo import spvcnn from model_zoo import minkunet'

kentang-mit commented 4 years ago

To evaluate the pre-trained model of SPVCNN / MinkUNet in the code

https://github.com/mit-han-lab/e3d/blob/4c494f211f36e368d291d39088c8122b34d1d227/spvnas/evaluate.py#L67-L69

missing importing model from model_zoo. 'from model_zoo import spvcnn from model_zoo import minkunet'

Thanks for your reminder, I have fixed the code.

HenryJunW commented 4 years ago

Cool. I got one more question, how long does it take for you to train the model SemanticKITTI_val_SPVCNN@119GMACs? Can you specify the GPUs # and the training time it takes? As for the evaluation, a single 1080Ti is mentioned in the paper, though. Also, is that possible for you to release the training configuration files for the SPVNAS model? I am wondering whether directly use the file net.config provided in https://hanlab.mit.edu/files/SPVNAS/spvnas_specialized/SemanticKITTI_val_SPVNAS@65GMACs/ as the yaml file. Thanks.