Closed HenryJunW closed 4 years ago
Hi,
Thanks for your interest in our work!
I didn't experience with the same problem when evaluating the pretrained model on my machine so I can only try to find the problem together with you through discussions. Specifically, have you also installed the OpenMPI library successfully on your machine? Also, please try to run
torchpack dist-run -np 1 -v python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs
to see what command is printed, thanks!
Best, Haotian
Thanks for your instant reply. Previously I installed mpich using conda, and I just removed it and installed OpenMPI instead. Unfortunately, got the similar results even adding -v as following. Do you think there is anything related with the first four claims of another issue, https://github.com/mit-han-lab/e3d/issues/2#issue-698773318? Also, I tried to add the '-H hostname:slots', doesn't help much. Thanks.
mpirun --allow-run-as-root -np 1 -H localhost:1 -bind-to none -map-by slot -x BASH_FUNC_module() -x BLOCKSIZE -x CC -x CLICOLOR -x CONDA_DEFAULT_ENV -x CONDA_EXE -x CONDA_PREFIX -x CONDA_PROMPT_MODIFIER -x CONDA_PYTHON_EXE -x CONDA_SHLVL -x CPATH -x CPLUS_INCLUDE_PATH -x CPPFLAGS -x CUDA_HOME -x CUDA_VISIBLE_DEVICES -x CXX -x EDITOR -x GPU_DEVICE_ORDINAL -x HISTCONTROL -x HISTSIZE -x HOME -x HOSTNAME -x KDEDIRS -x LANG -x LDFLAGS -x LD_LIBRARY_PATH -x LD_RUN_PATH -x LESSOPEN -x LIBRARY_PATH -x LOADEDMODULES -x LOGNAME -x LSCOLORS -x LS_COLORS -x MAIL -x MANPATH -x MASTER_HOST -x MODULEPATH -x MODULESHOME -x OLDPWD -x PATH -x PKG_CONFIG_PATH -x PPL_PATH -x PROJECT_HOME -x PS1 -x PWD -x PYTHONPATH -x QTDIR -x QTINC -x QTLIB -x QT_GRAPHICSSYSTEM_CHECKED -x QT_PLUGIN_PATH -x SELINUX_LEVEL_REQUESTED -x SELINUX_ROLE_REQUESTED -x SELINUX_USE_CURRENT_RANGE -x SHELL -x SHLVL -x SLURMD_NODENAME -x SLURM_CLUSTER_NAME -x SLURM_CPUS_ON_NODE -x SLURM_DISTRIBUTION -x SLURM_GTIDS -x SLURM_JOBID -x SLURM_JOB_ACCOUNT -x SLURM_JOB_CPUS_PER_NODE -x SLURM_JOB_GID -x SLURM_JOB_ID -x SLURM_JOB_NAME -x SLURM_JOB_NODELIST -x SLURM_JOB_NUM_NODES -x SLURM_JOB_PARTITION -x SLURM_JOB_QOS -x SLURM_JOB_UID -x SLURM_JOB_USER -x SLURM_LAUNCH_NODE_IPADDR -x SLURM_LOCALID -x SLURM_NNODES -x SLURM_NODEID -x SLURM_NODELIST -x SLURM_NPROCS -x SLURM_NTASKS -x SLURM_PRIO_PROCESS -x SLURM_PROCID -x SLURM_PTY_PORT -x SLURM_PTY_WIN_COL -x SLURM_PTY_WIN_ROW -x SLURM_SRUN_COMM_HOST -x SLURM_SRUN_COMM_PORT -x SLURM_STEPID -x SLURM_STEP_GPUS -x SLURM_STEP_ID -x SLURM_STEP_LAUNCHER_PORT -x SLURM_STEP_NODELIST -x SLURM_STEP_NUM_NODES -x SLURM_STEP_NUM_TASKS -x SLURM_STEP_TASKS_PER_NODE -x SLURM_SUBMIT_DIR -x SLURM_SUBMIT_HOST -x SLURM_TASKS_PER_NODE -x SLURM_TASK_PID -x SLURM_TOPOLOGY_ADDR -x SLURM_TOPOLOGY_ADDR_PATTERN -x SLURM_UMASK -x SLURM_WORKING_CLUSTER -x SRUN_DEBUG -x SSH_ASKPASS -x SSH_CLIENT -x SSH_CONNECTION -x SSH_TTY -x TERM -x TMPDIR -x TMUX -x TMUX_PANE -x USER -x WORKON_HOME -x XDG_DATA_DIRS -x XDG_RUNTIME_DIR -x XDG_SESSIONID -x -x _CE_CONDA -x _CE_M -x LMFILES -mca pml ob1 -mca btl ^openib -mca btl_tcp_if_exclude docker0,lo python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs
/bin/sh: -c: line 0: syntax error near unexpected token (' /bin/sh: -c: line 0:
mpirun --allow-run-as-root -np 1 -H localhost:1 -bind-to none -map-by slot -x BASH_FUNC_module() -x BLOCKSIZE -x CC -x CLICOLOR -x CONDA_DEFAULT_ENV -x CONDA_EXE -x CONDA_PREFIX -x CONDA_PROMPT_MODIFIER -x CONDA_PYTHON_EXE -x CONDA_SHLVL -x CPATH -x CPLUS_INCLUDE_PATH -x CPPFLAGS -x CUDA_HOME -x CUDA_VISIBLE_DEVICES -x CXX -x EDITOR -x GPU_DEVICE_ORDINAL -x HISTCONTROL -x HISTSIZE -x HOME -x HOSTNAME -x KDEDIRS -x LANG -x LDFLAGS -x LD_LIBRARY_PATH -x LD_RUN_PATH -x LESSOPEN -x LIBRARY_PATH -x LOADEDMODULES -x LOGNAME -x LSCOLORS -x LS_COLORS -x MAIL -x MANPATH -x MASTER_HOST -x MODULEPATH -x MODULESHOME -x OLDPWD -x PATH -x PKG_CONFIG_PATH -x PPL_PATH -x PROJECT_HOME -x PS1 -x PWD -x PYTHONPATH -x QTDIR -x QTINC -x QTLIB -x QT_GRAPHICSSYSTEM_CHECKED -x QT_PLUGIN_PATH -x SELINUX_LEVEL_REQUESTED -x SELINUX_ROLE_REQUESTED -x SELINUX_USE_CURRENT_RANGE -x SHELL -x SHLVL -x SLURMD_NODENAME -x SLURM_CLUSTER_NAME -x SLURM_CPUS_ON_NODE -x SLURM_DISTRIBUTION -x SLURM_GTIDS -x SLURM_JOBID -x SLURM_JOB_ACCOUNT -x SLURM_JOB_CPUS_PER_NODE -x SLURM_JOB_GID -x SLURM_JOB_ID -x SLURM_JOB_NAME -x SLURM_JOB_NODELIST -x SLURM_JOB_NUM_NODES -x SLURM_JOB_PARTITION -x SLURM_JOB_QOS -x SLURM_JOB_UID -x SLURM_JOB_USER -x SLURM_LAUNCH_NODE_IPADDR -x SLURM_LOCALID -x SLURM_NNODES -x SLURM_NODEID -x SLURM_NODELIST -x SLURM_NPROCS -x SLURM_NTASKS -x SLURM_PRIO_PROCESS -x SLURM_PROCID -x SLURM_PTY_PORT -x SLURM_PTY_WIN_COL -x SLURM_PTY_WIN_ROW -x SLURM_SRUN_COMM_HOST -x SLURM_SRUN_COMM_PORT -x SLURM_STEPID -x SLURM_STEP_GPUS -x SLURM_STEP_ID -x SLURM_STEP_LAUNCHER_PORT -x SLURM_STEP_NODELIST -x SLURM_STEP_NUM_NODES -x SLURM_STEP_NUM_TASKS -x SLURM_STEP_TASKS_PER_NODE -x SLURM_SUBMIT_DIR -x SLURM_SUBMIT_HOST -x SLURM_TASKS_PER_NODE -x SLURM_TASK_PID -x SLURM_TOPOLOGY_ADDR -x SLURM_TOPOLOGY_ADDR_PATTERN -x SLURM_UMASK -x SLURM_WORKING_CLUSTER -x SRUN_DEBUG -x SSH_ASKPASS -x SSH_CLIENT -x SSH_CONNECTION -x SSH_TTY -x TERM -x TMPDIR -x TMUX -x TMUX_PANE -x USER -x WORKON_HOME -x XDG_DATA_DIRS -x XDG_RUNTIME_DIR -x XDG_SESSIONID -x -x _CE_CONDA -x _CE_M -x LMFILES -mca pml ob1 -mca btl ^openib -mca btl_tcp_if_exclude docker0,lo python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs'
Hi,
I've checked the command provided by you. It turns out that the BASH_FUNC_module()
part of your command causes the problem. Temporarily, I suggest you to run
export MASTER_HOST=127.0.0.1:[Arbitrary Port Number]
and
mpirun --allow-run-as-root -np 1 -H localhost:1 -bind-to none -map-by slot -x "BASH_FUNC_module()" -x BLOCKSIZE -x CC -x CLICOLOR -x CONDA_DEFAULT_ENV -x CONDA_EXE -x CONDA_PREFIX -x CONDA_PROMPT_MODIFIER -x CONDA_PYTHON_EXE -x CONDA_SHLVL -x CPATH -x CPLUS_INCLUDE_PATH -x CPPFLAGS -x CUDA_HOME -x CUDA_VISIBLE_DEVICES -x CXX -x EDITOR -x GPU_DEVICE_ORDINAL -x HISTCONTROL -x HISTSIZE -x HOME -x HOSTNAME -x KDEDIRS -x LANG -x LDFLAGS -x LD_LIBRARY_PATH -x LD_RUN_PATH -x LESSOPEN -x LIBRARY_PATH -x LOADEDMODULES -x LOGNAME -x LSCOLORS -x LS_COLORS -x MAIL -x MANPATH -x MASTER_HOST -x MODULEPATH -x MODULESHOME -x OLDPWD -x PATH -x PKG_CONFIG_PATH -x PPL_PATH -x PROJECT_HOME -x PS1 -x PWD -x PYTHONPATH -x QTDIR -x QTINC -x QTLIB -x QT_GRAPHICSSYSTEM_CHECKED -x QT_PLUGIN_PATH -x SELINUX_LEVEL_REQUESTED -x SELINUX_ROLE_REQUESTED -x SELINUX_USE_CURRENT_RANGE -x SHELL -x SHLVL -x SLURMD_NODENAME -x SLURM_CLUSTER_NAME -x SLURM_CPUS_ON_NODE -x SLURM_DISTRIBUTION -x SLURM_GTIDS -x SLURM_JOBID -x SLURM_JOB_ACCOUNT -x SLURM_JOB_CPUS_PER_NODE -x SLURM_JOB_GID -x SLURM_JOB_ID -x SLURM_JOB_NAME -x SLURM_JOB_NODELIST -x SLURM_JOB_NUM_NODES -x SLURM_JOB_PARTITION -x SLURM_JOB_QOS -x SLURM_JOB_UID -x SLURM_JOB_USER -x SLURM_LAUNCH_NODE_IPADDR -x SLURM_LOCALID -x SLURM_NNODES -x SLURM_NODEID -x SLURM_NODELIST -x SLURM_NPROCS -x SLURM_NTASKS -x SLURM_PRIO_PROCESS -x SLURM_PROCID -x SLURM_PTY_PORT -x SLURM_PTY_WIN_COL -x SLURM_PTY_WIN_ROW -x SLURM_SRUN_COMM_HOST -x SLURM_SRUN_COMM_PORT -x SLURM_STEPID -x SLURM_STEP_GPUS -x SLURM_STEP_ID -x SLURM_STEP_LAUNCHER_PORT -x SLURM_STEP_NODELIST -x SLURM_STEP_NUM_NODES -x SLURM_STEP_NUM_TASKS -x SLURM_STEP_TASKS_PER_NODE -x SLURM_SUBMIT_DIR -x SLURM_SUBMIT_HOST -x SLURM_TASKS_PER_NODE -x SLURM_TASK_PID -x SLURM_TOPOLOGY_ADDR -x SLURM_TOPOLOGY_ADDR_PATTERN -x SLURM_UMASK -x SLURM_WORKING_CLUSTER -x SRUN_DEBUG -x SSH_ASKPASS -x SSH_CLIENT -x SSH_CONNECTION -x SSH_TTY -x TERM -x TMPDIR -x TMUX -x TMUX_PANE -x USER -x WORKON_HOME -x XDG_DATA_DIRS -x XDG_RUNTIME_DIR -x XDG_SESSION_ID -x _ -x _CE_CONDA -x _CE_M -x LMFILES -mca pml ob1 -mca btl ^openib -mca btl_tcp_if_exclude docker0,lo python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs
to try whether you can successfully run the inference. Note that I added a pair of ""
around -x BASH_FUNC_module()
.
To point out, I also tried to run our code in an Anaconda environment, and it turns out I'm getting much shorter command comparing with yours. I'll sync with Zhijian (the author of torchpack and co-author of the SPVNAS project) on details related to mpirun arguments. Hope that temporary solution can work for you.
Best, Haotian
It works and I can evaluate succeessfully following your instruction. You are right, perhaps it is because of the quote of the 'BASH_FUNC_module()'. Thanks for your effort and time, and I appreciate it very much.
Hi @HenryJunW, I've added a patch in the latest torchpack
to fix this issue. Could you please install the latest version and have a try? Please let me know if you have any questions. Thanks!
@zhijian-liu Works perfectly this time. Thanks.
To evaluate the pre-trained model of SPVCNN / MinkUNet in the code https://github.com/mit-han-lab/e3d/blob/4c494f211f36e368d291d39088c8122b34d1d227/spvnas/evaluate.py#L67-L69 missing importing model from model_zoo.
'from model_zoo import spvcnn from model_zoo import minkunet'
To evaluate the pre-trained model of SPVCNN / MinkUNet in the code
missing importing model from model_zoo. 'from model_zoo import spvcnn from model_zoo import minkunet'
Thanks for your reminder, I have fixed the code.
Cool. I got one more question, how long does it take for you to train the model SemanticKITTI_val_SPVCNN@119GMACs? Can you specify the GPUs # and the training time it takes? As for the evaluation, a single 1080Ti is mentioned in the paper, though. Also, is that possible for you to release the training configuration files for the SPVNAS model? I am wondering whether directly use the file net.config provided in https://hanlab.mit.edu/files/SPVNAS/spvnas_specialized/SemanticKITTI_val_SPVNAS@65GMACs/ as the yaml file. Thanks.
Thanks for your wonderful work.
I got errors while evaluating using 'torchpack dist-run -np 1 python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs',
/bin/sh: -c: line 0: syntax error near unexpected token
(' /bin/sh: -c: line 0:
mpirun --allow-run-as-root -np 1 -H localhost:1 -bind-to none -map-by slot -x BASH_FUNC_module() -x BLOCKSIZE -x CC -x CLICOLOR -x CONDA_DEFAULT_ENV -x CONDA_EXE -x CONDA_PREFIX -x CONDA_PROMPT_MODIFIER -x CONDA_PYTHON_EXE -x CONDA_SHLVL -x CPATH -x CPLUS_INCLUDE_PATH -x CPPFLAGS -x CUDA_HOME -x CUDA_VISIBLE_DEVICES -x CXX -x EDITOR -x GPU_DEVICE_ORDINAL -x HISTCONTROL -x HISTSIZE -x HOME -x HOSTNAME -x KDEDIRS -x LANG -x LDFLAGS -x LD_LIBRARY_PATH -x LD_RUN_PATH -x LESSOPEN -x LIBRARY_PATH -x LOADEDMODULES -x LOGNAME -x LSCOLORS -x LS_COLORS -x MAIL -x MANPATH -x MASTER_HOST -x MODULEPATH -x MODULESHOME -x OLDPWD -x PATH -x PKG_CONFIG_PATH -x PPL_PATH -x PROJECT_HOME -x PS1 -x PWD -x PYTHONPATH -x QTDIR -x QTINC -x QTLIB -x QT_GRAPHICSSYSTEM_CHECKED -x QT_PLUGIN_PATH -x SELINUX_LEVEL_REQUESTED -x SELINUX_ROLE_REQUESTED -x SELINUX_USE_CURRENT_RANGE -x SHELL -x SHLVL -x SLURMD_NODENAME -x SLURM_CLUSTER_NAME -x SLURM_CPUS_ON_NODE -x SLURM_DISTRIBUTION -x SLURM_GTIDS -x SLURM_JOBID -x SLURM_JOB_ACCOUNT -x SLURM_JOB_CPUS_PER_NODE -x SLURM_JOB_GID -x SLURM_JOB_ID -x SLURM_JOB_NAME -x SLURM_JOB_NODELIST -x SLURM_JOB_NUM_NODES -x SLURM_JOB_PARTITION -x SLURM_JOB_QOS -x SLURM_JOB_UID -x SLURM_JOB_USER -x SLURM_LAUNCH_NODE_IPADDR -x SLURM_LOCALID -x SLURM_NNODES -x SLURM_NODEID -x SLURM_NODELIST -x SLURM_NPROCS -x SLURM_NTASKS -x SLURM_PRIO_PROCESS -x SLURM_PROCID -x SLURM_PTY_PORT -x SLURM_PTY_WIN_COL -x SLURM_PTY_WIN_ROW -x SLURM_SRUN_COMM_HOST -x SLURM_SRUN_COMM_PORT -x SLURM_STEPID -x SLURM_STEP_GPUS -x SLURM_STEP_ID -x SLURM_STEP_LAUNCHER_PORT -x SLURM_STEP_NODELIST -x SLURM_STEP_NUM_NODES -x SLURM_STEP_NUM_TASKS -x SLURM_STEP_TASKS_PER_NODE -x SLURM_SUBMIT_DIR -x SLURM_SUBMIT_HOST -x SLURM_TASKS_PER_NODE -x SLURM_TASK_PID -x SLURM_TOPOLOGY_ADDR -x SLURM_TOPOLOGY_ADDR_PATTERN -x SLURM_UMASK -x SLURM_WORKING_CLUSTER -x SRUN_DEBUG -x SSH_ASKPASS -x SSH_CLIENT -x SSH_CONNECTION -x SSH_TTY -x TERM -x TMPDIR -x TMUX -x TMUX_PANE -x USER -x WORKON_HOME -x XDG_DATA_DIRS -x XDG_RUNTIME_DIR -x XDG_SESSIONID -x -x _CE_CONDA -x _CE_M -x LMFILES -mca pml ob1 -mca btl ^openib -mca btl_tcp_if_exclude docker0,lo python -m torchpack.launch.assets.silentrun python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs'I checked I got the torchpack and torchsparse installed successfully. Any idea how to solve the issue? Thank you in advance.