Closed chairmanQi closed 6 months ago
Thanks for your concerning, but due to the dataset is more and more complex, this tool (case_analyzer) may be too old and not updated, this tool may be discarded right now, we will plan to have an updated version in near future, and very welcome to supply this feature by your self. Feel free to reopen it if needed.
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
{'CUDA available': True, 'CUDA_HOME': None, 'GCC': 'gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0', 'GPU 0': 'NVIDIA GeForce GTX 1050 Ti', 'MMEngine': '0.10.4', 'MUSA available': False, 'OpenCV': '4.9.0', 'PyTorch': '2.3.0', 'PyTorch compiling details': 'PyTorch built with:\n' ' - GCC 9.3\n' ' - C++ Version: 201703\n' ' - Intel(R) oneAPI Math Kernel Library Version ' '2023.1-Product Build 20230303 for Intel(R) 64 ' 'architecture applications\n' ' - Intel(R) MKL-DNN v3.3.6 (Git Hash ' '86e6af5974177e513fd3fee58425e1063e7f1361)\n' ' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n' ' - LAPACK is enabled (usually provided by ' 'MKL)\n' ' - NNPACK is enabled\n' ' - CPU capability usage: AVX2\n' ' - CUDA Runtime 12.1\n' ' - NVCC architecture flags: ' '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n' ' - CuDNN 8.9.2\n' ' - Magma 2.6.1\n' ' - Build settings: BLAS_INFO=mkl, ' 'BUILD_TYPE=Release, CUDA_VERSION=12.1, ' 'CUDNN_VERSION=8.9.2, ' 'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, ' 'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 ' '-fabi-version=11 -fvisibility-inlines-hidden ' '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO ' '-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM ' '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK ' '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE ' '-O2 -fPIC -Wall -Wextra -Werror=return-type ' '-Werror=non-virtual-dtor -Werror=bool-operation ' '-Wnarrowing -Wno-missing-field-initializers ' '-Wno-type-limits -Wno-array-bounds ' '-Wno-unknown-pragmas -Wno-unused-parameter ' '-Wno-unused-function -Wno-unused-result ' '-Wno-strict-overflow -Wno-strict-aliasing ' '-Wno-stringop-overflow -Wsuggest-override ' '-Wno-psabi -Wno-error=pedantic ' '-Wno-error=old-style-cast -Wno-missing-braces ' '-fdiagnostics-color=always -faligned-new ' '-Wno-unused-but-set-variable ' '-Wno-maybe-uninitialized -fno-math-errno ' '-fno-trapping-math -Werror=format ' '-Wno-stringop-overflow, LAPACK_INFO=mkl, ' 'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, ' 'PERF_WITH_AVX512=1, TORCH_VERSION=2.3.0, ' 'USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, ' 'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, ' 'USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, ' 'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, ' 'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, ' 'USE_ROCM_KERNEL_ASSERT=OFF, \n', 'Python': '3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]', 'TorchVision': '0.18.0', 'numpy_random_seed': 2147483648, 'opencompass': '0.2.4+81d0e4d', 'sys.platform': 'linux'}
Reproduces the problem - code/configuration sample
https://opencompass-zh-cn.readthedocs.io/zh-cn/latest/tools.html Case Analyzer 本工具在已有评测结果的基础上,产出推理错误样本以及带有标注信息的全量样本。 运行方式: python tools/case_analyzer.py CONFIG_PATH [-w WORK_DIR] -w:工作路径,默认为 './outputs/default'。 '-w', '--work-dir', help='Work path, all the outputs will be ' 'saved in this path, including the slurm logs, ' 'the evaluation results, the summary results, etc.' 'If not specified, the work_dir will be set to ' './outputs/default.',
Reproduces the problem - command or script
python tools/case_analyzer.py configs/eval_demo.py
Reproduces the problem - error message
Traceback (most recent call last): File "/home/qwq/opencompass/tools/case_analyzer.py", line 201, in
main() # 调用主函数
File "/home/qwq/opencompass/tools/case_analyzer.py", line 198, in main
dispatch_tasks(cfg, force=args.force) # 分派任务
File "/home/qwq/opencompass/tools/case_analyzer.py", line 188, in dispatch_tasks
}).run()
File "/home/qwq/opencompass/tools/case_analyzer.py", line 101, in run
references = dataset[self.ds_column]
TypeError: 'siqaDataset_V2' object is not subscriptable
Other information
1What's your expected result?:How to use the case_analyzer.py to list bad case or good case? 2What dataset did you use?:siqa_gen&&winograd_ppl ,case form “python run.py --models hf_opt_125m --datasets siqa_gen winograd_ppl --debug” I'm really confused. I hope to get help from developers or others!