Closed sriyachakravarthy closed 2 months ago
Also, getting the following error for truthfulqa:
AssertionError: truth_model
should be set to perform API eval.If you want to perform basic metric eval, please refer to the docstring of /rhome/sriyar/Sriya/opencompass/opencompass/datasets/truthfulqa.py for more details.
Inference Time/Execution Time: 3044 seconds
For the base model, we recommend using perplexity (ppl) for evaluation in multi-choice questions.
python -u run.py --datasets commonsenseqa_ppl --hf-num-gpus 1 --hf-type base --hf-path meta-llama/Meta-Llama-3-8B --debug --model-kwargs device_map='auto' trust_remote_code=True --batch-size 8
dataset version metric mode Meta-Llama-3-8B_hf
-------------- --------- -------- ------ --------------------
commonsense_qa 554500.00 accuracy ppl 70.19
Feel free to re-open if needed.
For the base model, we recommend using perplexity (ppl) for evaluation in multi-choice questions.
python -u run.py --datasets commonsenseqa_ppl --hf-num-gpus 1 --hf-type base --hf-path meta-llama/Meta-Llama-3-8B --debug --model-kwargs device_map='auto' trust_remote_code=True --batch-size 8
dataset version metric mode Meta-Llama-3-8B_hf -------------- --------- -------- ------ -------------------- commonsense_qa 554500.00 accuracy ppl 70.19
hi, could you please help to check why the results are blank?
(opencompass) [~/project/LLM/opencompass]$ python -u run.py --datasets commonsenseqa_ppl --hf-num-gpus 1 --hf-type base --hf-path meta-llama/Meta-Llama-3-8B --debug --model-kwargs device_map='auto' trust_remote_code=True --batch-size 8
/home/xxxx/miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. Use HF_HOME
instead.
warnings.warn(
08/21 21:05:26 - OpenCompass - INFO - Loading commonsenseqa_ppl: configs/datasets/commonsenseqa/commonsenseqa_ppl.py
08/21 21:05:26 - OpenCompass - DEBUG - Using model: {'type': 'opencompass.models.huggingface_above_v4_33.HuggingFaceBaseModel', 'abbr': 'Meta-Llama-3-8B_hf', 'path': 'meta-llama/Meta-Llama-3-8B', 'model_kwargs': {'device_map': 'auto', 'trust_remote_code': True}, 'tokenizer_path': None, 'tokenizer_kwargs': {}, 'generation_kwargs': {}, 'peft_path': None, 'peft_kwargs': {}, 'max_seq_len': None, 'max_out_len': 256, 'batch_size': 8, 'pad_token_id': None, 'stop_words': [], 'run_cfg': {'num_gpus': 1}}
08/21 21:05:26 - OpenCompass - INFO - Loading example: configs/summarizers/example.py
08/21 21:05:26 - OpenCompass - INFO - Current exp folder: outputs/default/20240821_210526
08/21 21:05:26 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
08/21 21:05:26 - OpenCompass - DEBUG - Modules of opencompass's partitioner registry have been automatically imported from opencompass.partitioners
08/21 21:05:26 - OpenCompass - DEBUG - Get class NumWorkerPartitioner
from "partitioner" registry in "opencompass"
08/21 21:05:26 - OpenCompass - DEBUG - An NumWorkerPartitioner
instance is built from registry, and its implementation can be found in opencompass.partitioners.num_worker
08/21 21:05:26 - OpenCompass - DEBUG - Key eval.runner.task.judge_cfg not found in config, ignored.
08/21 21:05:26 - OpenCompass - DEBUG - Key eval.runner.task.dump_details not found in config, ignored.
08/21 21:05:26 - OpenCompass - DEBUG - Key eval.given_pred not found in config, ignored.
08/21 21:05:26 - OpenCompass - DEBUG - Key eval.runner.task.cal_extract_rate not found in config, ignored.
08/21 21:05:26 - OpenCompass - DEBUG - Additional config: {}
08/21 21:05:26 - OpenCompass - INFO - Partitioned into 1 tasks.
08/21 21:05:26 - OpenCompass - DEBUG - Task 0: [Meta-Llama-3-8B_hf/commonsense_qa]
08/21 21:05:26 - OpenCompass - DEBUG - Modules of opencompass's runner registry have been automatically imported from opencompass.runners
08/21 21:05:26 - OpenCompass - DEBUG - Get class LocalRunner
from "runner" registry in "opencompass"
08/21 21:05:26 - OpenCompass - DEBUG - An LocalRunner
instance is built from registry, and its implementation can be found in opencompass.runners.local
08/21 21:05:26 - OpenCompass - DEBUG - Modules of opencompass's task registry have been automatically imported from opencompass.tasks
08/21 21:05:26 - OpenCompass - DEBUG - Get class OpenICLInferTask
from "task" registry in "opencompass"
08/21 21:05:26 - OpenCompass - DEBUG - An OpenICLInferTask
instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_infer
08/21 21:05:27 - OpenCompass - WARNING - Only use 1 GPUs for total 2 available GPUs in debug mode.
08/21 21:05:27 - OpenCompass - DEBUG - Debug mode, log will be saved to tmp/3412580_debug.log
08/21 21:05:33 - OpenCompass - DEBUG - Get class NaivePartitioner
from "partitioner" registry in "opencompass"
08/21 21:05:33 - OpenCompass - DEBUG - An NaivePartitioner
instance is built from registry, and its implementation can be found in opencompass.partitioners.naive
08/21 21:05:33 - OpenCompass - DEBUG - Key eval.runner.task.judge_cfg not found in config, ignored.
08/21 21:05:33 - OpenCompass - DEBUG - Key eval.runner.task.dump_details not found in config, ignored.
08/21 21:05:33 - OpenCompass - DEBUG - Key eval.given_pred not found in config, ignored.
08/21 21:05:33 - OpenCompass - DEBUG - Key eval.runner.task.cal_extract_rate not found in config, ignored.
08/21 21:05:33 - OpenCompass - DEBUG - Additional config: {'eval': {'runner': {'task': {}}}}
08/21 21:05:33 - OpenCompass - INFO - Partitioned into 1 tasks.
08/21 21:05:33 - OpenCompass - DEBUG - Task 0: [Meta-Llama-3-8B_hf/commonsense_qa]
08/21 21:05:33 - OpenCompass - DEBUG - Get class LocalRunner
from "runner" registry in "opencompass"
08/21 21:05:33 - OpenCompass - DEBUG - An LocalRunner
instance is built from registry, and its implementation can be found in opencompass.runners.local
08/21 21:05:33 - OpenCompass - DEBUG - Get class OpenICLEvalTask
from "task" registry in "opencompass"
08/21 21:05:33 - OpenCompass - DEBUG - An OpenICLEvalTask
instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_eval
08/21 21:05:34 - OpenCompass - DEBUG - Modules of opencompass's load_dataset registry have been automatically imported from opencompass.datasets
08/21 21:05:34 - OpenCompass - DEBUG - Get class commonsenseqaDataset
from "load_dataset" registry in "opencompass"
08/21 21:05:34 - OpenCompass - DEBUG - An commonsenseqaDataset
instance is built from registry, and its implementation can be found in opencompass.datasets.commonsenseqa
08/21 21:05:34 - OpenCompass - ERROR - /home/xxxx/project/LLM/opencompass/opencompass/tasks/openicl_eval.py - _score - 253 - Task [Meta-Llama-3-8B_hf/commonsense_qa]: No predictions found.
08/21 21:05:34 - OpenCompass - DEBUG - An DefaultSummarizer
instance is built from registry, and its implementation can be found in opencompass.summarizers.default
dataset version metric mode Meta-Llama-3-8B_hf
commonsense_qa - - - - 08/21 21:05:34 - OpenCompass - INFO - write summary to /home/xxxx/project/LLM/opencompass/outputs/default/20240821_210526/summary/summary_20240821_210526.txt 08/21 21:05:34 - OpenCompass - INFO - write csv to /home/xxxx/project/LLM/opencompass/outputs/default/20240821_210526/summary/summary_20240821_210526.csv
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
{'CUDA available': False, 'GCC': 'gcc (Ubuntu 12.3.0-1ubuntu1~23.04) 12.3.0', 'MMEngine': '0.10.4', 'MUSA available': False, 'OpenCV': '4.10.0', 'PyTorch': '2.4.0+rocm6.1', 'PyTorch compiling details': 'PyTorch built with:\n' ' - GCC 9.3\n' ' - C++ Version: 201703\n' ' - Intel(R) oneAPI Math Kernel Library Version ' '2022.2-Product Build 20220804 for Intel(R) 64 ' 'architecture applications\n' ' - Intel(R) MKL-DNN v3.4.2 (Git Hash ' '1137e04ec0b5251ca2b4400a4fd3c667ce843d67)\n' ' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n' ' - LAPACK is enabled (usually provided by ' 'MKL)\n' ' - NNPACK is enabled\n' ' - CPU capability usage: AVX512\n' ' - Build settings: BLAS_INFO=mkl, ' 'BUILD_TYPE=Release, ' 'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, ' 'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 ' '-fabi-version=11 -fvisibility-inlines-hidden ' '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO ' '-DLIBKINETO_NOCUPTI -DUSE_FBGEMM ' '-DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK ' '-DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC ' '-Wall -Wextra -Werror=return-type ' '-Werror=non-virtual-dtor -Werror=bool-operation ' '-Wnarrowing -Wno-missing-field-initializers ' '-Wno-type-limits -Wno-array-bounds ' '-Wno-unknown-pragmas -Wno-unused-parameter ' '-Wno-unused-function -Wno-unused-result ' '-Wno-strict-overflow -Wno-strict-aliasing ' '-Wno-stringop-overflow -Wsuggest-override ' '-Wno-psabi -Wno-error=pedantic ' '-Wno-error=old-style-cast -Wno-missing-braces ' '-fdiagnostics-color=always -faligned-new ' '-Wno-unused-but-set-variable ' '-Wno-maybe-uninitialized -fno-math-errno ' '-fno-trapping-math -Werror=format ' '-Wno-stringop-overflow, LAPACK_INFO=mkl, ' 'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, ' 'PERF_WITH_AVX512=1, TORCH_VERSION=2.4.0, ' 'USE_CUDA=OFF, USE_CUDNN=OFF, ' 'USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, ' 'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, ' 'USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, ' 'USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, ' 'USE_ROCM=ON, USE_ROCM_KERNEL_ASSERT=OFF, \n', 'Python': '3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]', 'TorchVision': '0.19.0+rocm6.1', 'lmdeploy': "not installed:No module named 'lmdeploy'", 'numpy_random_seed': 2147483648, 'opencompass': '0.3.0+88eb912', 'sys.platform': 'linux', 'transformers': '4.44.0'}
Reproduces the problem - code/configuration sample
CUDA_VISIBLE_DEVICES=0 python -u run.py --datasets commonsenseqa_gen --hf-num-gpus 1 --hf-type base --hf-path meta-llama/Meta-Llama-3-8B --debug --model-kwargs device_map='auto' trust_remote_code=True --batch-size 1
Reproduces the problem - command or script
outputs/default/20240809_090910/results/Meta-Llama-3-8B_hf/commonsense_qa.json dataset version metric mode Meta-Llama-3-8B_hf
commonsense_qa c946f2 accuracy gen 0.00
Reproduces the problem - error message
outputs/default/20240809_090910/results/Meta-Llama-3-8B_hf/commonsense_qa.json dataset version metric mode Meta-Llama-3-8B_hf
commonsense_qa c946f2 accuracy gen 0.00
Other information
No response