The unexpected results still exist in the latest version.
Describe the Issue
A clear and concise description of what the bug is, including what results are expected and what the real results you got.
When I evaluate ImageNet VID val dataset in mmtracking, the processbar always stopped at 130968/176126 like follows, there is no error reported, I observerd the status of gpus and they are still in dynamic working. I have checked the data around it, there is no problem in data.
Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
Thanks for reporting the unexpected results and we appreciate it a lot.
Checklist
Describe the Issue A clear and concise description of what the bug is, including what results are expected and what the real results you got.
When I evaluate ImageNet VID val dataset in mmtracking, the processbar always stopped at 130968/176126 like follows, there is no error reported, I observerd the status of gpus and they are still in dynamic working. I have checked the data around it, there is no problem in data.
Reproduction
Did you make any modifications on the code? Did you understand what you have modified? None. Environment
Please run
python -c "from mmcv.utils import collect_env; print(collect_env())"
to collect necessary environment information and paste it here. {'sys.platform': 'linux', 'Python': '3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21) [GCC 9.4.0]', 'CUDA available': True, 'GPU 0,1,2,3,4,5,6,7': 'GeForce RTX 3090', 'CUDA_HOME': '/usr/local/cuda', 'NVCC': 'Build cuda_11.1.TC455_06.29069683_0', 'GCC': 'gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0', 'PyTorch': '1.8.0', 'PyTorch compiling details': 'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.1\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37\n - CuDNN 8.0.5\n - Magma 2.5.2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n', 'TorchVision': '0.9.0', 'OpenCV': '4.5.4-dev', 'MMCV': '1.3.17', 'MMCV Compiler': 'GCC 7.3', 'MMCV CUDA Compiler': '11.1'}You may add addition that may be helpful for locating the problem, such as
$PATH
,$LD_LIBRARY_PATH
,$PYTHONPATH
, etc.)Error traceback If applicable, paste the error traceback here.
Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!