open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
4.09k stars 436 forks source link

[Bug] case_analyzer.py run error&&TypeError: 'siqaDataset_V2' object is not subscriptable #1086

Closed chairmanQi closed 6 months ago

chairmanQi commented 6 months ago

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

{'CUDA available': True, 'CUDA_HOME': None, 'GCC': 'gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0', 'GPU 0': 'NVIDIA GeForce GTX 1050 Ti', 'MMEngine': '0.10.4', 'MUSA available': False, 'OpenCV': '4.9.0', 'PyTorch': '2.3.0', 'PyTorch compiling details': 'PyTorch built with:\n' ' - GCC 9.3\n' ' - C++ Version: 201703\n' ' - Intel(R) oneAPI Math Kernel Library Version ' '2023.1-Product Build 20230303 for Intel(R) 64 ' 'architecture applications\n' ' - Intel(R) MKL-DNN v3.3.6 (Git Hash ' '86e6af5974177e513fd3fee58425e1063e7f1361)\n' ' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n' ' - LAPACK is enabled (usually provided by ' 'MKL)\n' ' - NNPACK is enabled\n' ' - CPU capability usage: AVX2\n' ' - CUDA Runtime 12.1\n' ' - NVCC architecture flags: ' '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n' ' - CuDNN 8.9.2\n' ' - Magma 2.6.1\n' ' - Build settings: BLAS_INFO=mkl, ' 'BUILD_TYPE=Release, CUDA_VERSION=12.1, ' 'CUDNN_VERSION=8.9.2, ' 'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, ' 'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 ' '-fabi-version=11 -fvisibility-inlines-hidden ' '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO ' '-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM ' '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK ' '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE ' '-O2 -fPIC -Wall -Wextra -Werror=return-type ' '-Werror=non-virtual-dtor -Werror=bool-operation ' '-Wnarrowing -Wno-missing-field-initializers ' '-Wno-type-limits -Wno-array-bounds ' '-Wno-unknown-pragmas -Wno-unused-parameter ' '-Wno-unused-function -Wno-unused-result ' '-Wno-strict-overflow -Wno-strict-aliasing ' '-Wno-stringop-overflow -Wsuggest-override ' '-Wno-psabi -Wno-error=pedantic ' '-Wno-error=old-style-cast -Wno-missing-braces ' '-fdiagnostics-color=always -faligned-new ' '-Wno-unused-but-set-variable ' '-Wno-maybe-uninitialized -fno-math-errno ' '-fno-trapping-math -Werror=format ' '-Wno-stringop-overflow, LAPACK_INFO=mkl, ' 'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, ' 'PERF_WITH_AVX512=1, TORCH_VERSION=2.3.0, ' 'USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, ' 'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, ' 'USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, ' 'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, ' 'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, ' 'USE_ROCM_KERNEL_ASSERT=OFF, \n', 'Python': '3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]', 'TorchVision': '0.18.0', 'numpy_random_seed': 2147483648, 'opencompass': '0.2.4+81d0e4d', 'sys.platform': 'linux'}

Reproduces the problem - code/configuration sample

https://opencompass-zh-cn.readthedocs.io/zh-cn/latest/tools.html Case Analyzer 本工具在已有评测结果的基础上,产出推理错误样本以及带有标注信息的全量样本。 运行方式: python tools/case_analyzer.py CONFIG_PATH [-w WORK_DIR] -w:工作路径,默认为 './outputs/default'。 '-w', '--work-dir', help='Work path, all the outputs will be ' 'saved in this path, including the slurm logs, ' 'the evaluation results, the summary results, etc.' 'If not specified, the work_dir will be set to ' './outputs/default.',

Reproduces the problem - command or script

python tools/case_analyzer.py configs/eval_demo.py

Reproduces the problem - error message

Traceback (most recent call last): File "/home/qwq/opencompass/tools/case_analyzer.py", line 201, in main() # 调用主函数 File "/home/qwq/opencompass/tools/case_analyzer.py", line 198, in main dispatch_tasks(cfg, force=args.force) # 分派任务 File "/home/qwq/opencompass/tools/case_analyzer.py", line 188, in dispatch_tasks }).run() File "/home/qwq/opencompass/tools/case_analyzer.py", line 101, in run references = dataset[self.ds_column] TypeError: 'siqaDataset_V2' object is not subscriptable

Other information

1What's your expected result?:How to use the case_analyzer.py to list bad case or good case? 2What dataset did you use?:siqa_gen&&winograd_ppl ,case form “python run.py --models hf_opt_125m --datasets siqa_gen winograd_ppl --debug” I'm really confused. I hope to get help from developers or others!

bittersweet1999 commented 6 months ago

Thanks for your concerning, but due to the dataset is more and more complex, this tool (case_analyzer) may be too old and not updated, this tool may be discarded right now, we will plan to have an updated version in near future, and very welcome to supply this feature by your self. Feel free to reopen it if needed.