Open Jimmy-L99 opened 3 days ago
看下这个日志里的信息呢:outputs/default/20241126_105417/logs/infer/glm-4-9b-chat-vllm/cmmlu-agronomy.out
看下这个日志里的信息呢:outputs/default/20241126_105417/logs/infer/glm-4-9b-chat-vllm/cmmlu-agronomy.out
/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/opencompass/__init__.py:17: UserWarning: Starting from v0.4.0, all AMOTIC configuration files currently located in `./configs/datasets`, `./configs/models`, and `./configs/summarizers` will be migrated to the `opencompass/configs/` package. Please update your configuration file paths accordingly.
_warn_about_config_migration()
11/26 10:54:27 - OpenCompass - INFO - Task [glm-4-9b-chat-vllm/cmmlu-agronomy,glm-4-9b-chat-vllm/cmmlu-anatomy,glm-4-9b-chat-vllm/cmmlu-ancient_chinese,glm-4-9b-chat-vllm/cmmlu-arts,glm-4-9b-chat-vllm/cmmlu-astronomy,glm-4-9b-chat-vllm/cmmlu-business_ethics,glm-4-9b-chat-vllm/cmmlu-chinese_civil_service_exam,glm-4-9b-chat-vllm/cmmlu-chinese_driving_rule,glm-4-9b-chat-vllm/cmmlu-chinese_food_culture,glm-4-9b-chat-vllm/cmmlu-chinese_foreign_policy,glm-4-9b-chat-vllm/cmmlu-chinese_history,glm-4-9b-chat-vllm/cmmlu-chinese_literature,glm-4-9b-chat-vllm/cmmlu-chinese_teacher_qualification,glm-4-9b-chat-vllm/cmmlu-clinical_knowledge,glm-4-9b-chat-vllm/cmmlu-college_actuarial_science,glm-4-9b-chat-vllm/cmmlu-college_education,glm-4-9b-chat-vllm/cmmlu-college_engineering_hydrology,glm-4-9b-chat-vllm/cmmlu-college_law,glm-4-9b-chat-vllm/cmmlu-college_mathematics,glm-4-9b-chat-vllm/cmmlu-college_medical_statistics,glm-4-9b-chat-vllm/cmmlu-college_medicine,glm-4-9b-chat-vllm/cmmlu-computer_science,glm-4-9b-chat-vllm/cmmlu-computer_security,glm-4-9b-chat-vllm/cmmlu-conceptual_physics,glm-4-9b-chat-vllm/cmmlu-construction_project_management,glm-4-9b-chat-vllm/cmmlu-economics,glm-4-9b-chat-vllm/cmmlu-education,glm-4-9b-chat-vllm/cmmlu-electrical_engineering,glm-4-9b-chat-vllm/cmmlu-elementary_chinese,glm-4-9b-chat-vllm/cmmlu-elementary_commonsense,glm-4-9b-chat-vllm/cmmlu-elementary_information_and_technology,glm-4-9b-chat-vllm/cmmlu-elementary_mathematics,glm-4-9b-chat-vllm/cmmlu-ethnology,glm-4-9b-chat-vllm/cmmlu-food_science,glm-4-9b-chat-vllm/cmmlu-genetics,glm-4-9b-chat-vllm/cmmlu-global_facts,glm-4-9b-chat-vllm/cmmlu-high_school_biology,glm-4-9b-chat-vllm/cmmlu-high_school_chemistry,glm-4-9b-chat-vllm/cmmlu-high_school_geography,glm-4-9b-chat-vllm/cmmlu-high_school_mathematics,glm-4-9b-chat-vllm/cmmlu-high_school_physics,glm-4-9b-chat-vllm/cmmlu-high_school_politics,glm-4-9b-chat-vllm/cmmlu-human_sexuality,glm-4-9b-chat-vllm/cmmlu-international_law,glm-4-9b-chat-vllm/cmmlu-journalism,glm-4-9b-chat-vllm/cmmlu-jurisprudence,glm-4-9b-chat-vllm/cmmlu-legal_and_moral_basis,glm-4-9b-chat-vllm/cmmlu-logical,glm-4-9b-chat-vllm/cmmlu-machine_learning,glm-4-9b-chat-vllm/cmmlu-management,glm-4-9b-chat-vllm/cmmlu-marketing,glm-4-9b-chat-vllm/cmmlu-marxist_theory,glm-4-9b-chat-vllm/cmmlu-modern_chinese,glm-4-9b-chat-vllm/cmmlu-nutrition,glm-4-9b-chat-vllm/cmmlu-philosophy,glm-4-9b-chat-vllm/cmmlu-professional_accounting,glm-4-9b-chat-vllm/cmmlu-professional_law,glm-4-9b-chat-vllm/cmmlu-professional_medicine,glm-4-9b-chat-vllm/cmmlu-professional_psychology,glm-4-9b-chat-vllm/cmmlu-public_relations,glm-4-9b-chat-vllm/cmmlu-security_study,glm-4-9b-chat-vllm/cmmlu-sociology,glm-4-9b-chat-vllm/cmmlu-sports_science,glm-4-9b-chat-vllm/cmmlu-traditional_chinese_medicine,glm-4-9b-chat-vllm/cmmlu-virology,glm-4-9b-chat-vllm/cmmlu-world_history,glm-4-9b-chat-vllm/cmmlu-world_religions]
INFO 11-26 10:54:29 config.py:724] Defaulting to use mp for distributed inference
WARNING 11-26 10:54:29 arg_utils.py:762] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
--- Logging error ---
Traceback (most recent call last):
File "/root/anaconda3/envs/opencompass/lib/python3.10/logging/__init__.py", line 1100, in emit
msg = self.format(record)
File "/root/anaconda3/envs/opencompass/lib/python3.10/logging/__init__.py", line 943, in format
return fmt.format(record)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/logging/formatter.py", line 11, in format
msg = logging.Formatter.format(self, record)
File "/root/anaconda3/envs/opencompass/lib/python3.10/logging/__init__.py", line 678, in format
record.message = record.getMessage()
File "/root/anaconda3/envs/opencompass/lib/python3.10/logging/__init__.py", line 368, in getMessage
msg = msg % self.args
TypeError: %d format: a real number is required, not NoneType
Call stack:
File "/root/ljm/OpenCompass/opencompass/opencompass/tasks/openicl_infer.py", line 161, in <module>
inferencer.run()
File "/root/ljm/OpenCompass/opencompass/opencompass/tasks/openicl_infer.py", line 73, in run
self.model = build_model_from_cfg(model_cfg)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/opencompass/utils/build.py", line 25, in build_model_from_cfg
return MODELS.build(model_cfg)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/opencompass/models/vllm_with_tf_above_v4_33.py", line 46, in __init__
self._load_model(path, model_kwargs)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/opencompass/models/vllm_with_tf_above_v4_33.py", line 64, in _load_model
self.model = LLM(path, **model_kwargs)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 155, in __init__
self.llm_engine = LLMEngine.from_engine_args(
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 438, in from_engine_args
engine_config = engine_args.create_engine_config()
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 802, in create_engine_config
scheduler_config = SchedulerConfig(
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/config.py", line 815, in __init__
logger.info(
Message: 'Chunked prefill is enabled with max_num_batched_tokens=%d.'
Arguments: (None,)
INFO 11-26 10:54:29 llm_engine.py:176] Initializing an LLM engine (v0.5.3) with config: model='/root/ljm/models/glm-4-9b-chat', speculative_config=None, tokenizer='/root/ljm/models/glm-4-9b-chat', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=/root/ljm/models/glm-4-9b-chat, use_v2_block_manager=False, enable_prefix_caching=False)
WARNING 11-26 10:54:30 tokenizer.py:129] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
INFO 11-26 10:54:30 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
[1;36m(VllmWorkerProcess pid=940435)[0;0m INFO 11-26 10:54:30 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
[1;36m(VllmWorkerProcess pid=940435)[0;0m ERROR 11-26 10:54:30 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method init_device: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method, Traceback (most recent call last):
[1;36m(VllmWorkerProcess pid=940435)[0;0m ERROR 11-26 10:54:30 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
[1;36m(VllmWorkerProcess pid=940435)[0;0m ERROR 11-26 10:54:30 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
[1;36m(VllmWorkerProcess pid=940435)[0;0m ERROR 11-26 10:54:30 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/worker/worker.py", line 123, in init_device
[1;36m(VllmWorkerProcess pid=940435)[0;0m ERROR 11-26 10:54:30 multiproc_worker_utils.py:226] torch.cuda.set_device(self.device)
[1;36m(VllmWorkerProcess pid=940435)[0;0m ERROR 11-26 10:54:30 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/cuda/__init__.py", line 399, in set_device
[1;36m(VllmWorkerProcess pid=940435)[0;0m ERROR 11-26 10:54:30 multiproc_worker_utils.py:226] torch._C._cuda_setDevice(device)
[1;36m(VllmWorkerProcess pid=940435)[0;0m ERROR 11-26 10:54:30 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/cuda/__init__.py", line 279, in _lazy_init
[1;36m(VllmWorkerProcess pid=940435)[0;0m ERROR 11-26 10:54:30 multiproc_worker_utils.py:226] raise RuntimeError(
[1;36m(VllmWorkerProcess pid=940435)[0;0m ERROR 11-26 10:54:30 multiproc_worker_utils.py:226] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
[1;36m(VllmWorkerProcess pid=940435)[0;0m ERROR 11-26 10:54:30 multiproc_worker_utils.py:226]
Traceback (most recent call last):
File "/root/ljm/OpenCompass/opencompass/opencompass/tasks/openicl_infer.py", line 161, in <module>
inferencer.run()
File "/root/ljm/OpenCompass/opencompass/opencompass/tasks/openicl_infer.py", line 73, in run
self.model = build_model_from_cfg(model_cfg)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/opencompass/utils/build.py", line 25, in build_model_from_cfg
return MODELS.build(model_cfg)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/opencompass/models/vllm_with_tf_above_v4_33.py", line 46, in __init__
self._load_model(path, model_kwargs)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/opencompass/models/vllm_with_tf_above_v4_33.py", line 64, in _load_model
self.model = LLM(path, **model_kwargs)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 155, in __init__
self.llm_engine = LLMEngine.from_engine_args(
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 441, in from_engine_args
engine = cls(
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 251, in __init__
self.model_executor = executor_class(
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 25, in __init__
super().__init__(*args, **kwargs)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 47, in __init__
self._init_executor()
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 123, in _init_executor
self._run_workers("init_device")
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 178, in _run_workers
driver_worker_output = driver_worker_method(*args, **kwargs)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/worker/worker.py", line 132, in init_device
init_worker_distributed_environment(self.parallel_config, self.rank,
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/worker/worker.py", line 343, in init_worker_distributed_environment
init_distributed_environment(parallel_config.world_size, rank,
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 812, in init_distributed_environment
torch.distributed.init_process_group(
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
return func(*args, **kwargs)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 89, in wrapper
func_return = func(*args, **kwargs)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1305, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 199, in _tcp_rendezvous_handler
store = _create_c10d_store(result.hostname, result.port, rank, world_size, timeout, use_libuv)
File "/root/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 174, in _create_c10d_store
return TCPStore(
torch.distributed.DistStoreError: Timed out after 601 seconds waiting for clients. 1/2 clients joined.
INFO 11-26 11:04:31 multiproc_worker_utils.py:123] Killing local vLLM worker processes
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stdout>'> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0x000000002e309460)
Current thread 0x00007f96629b94c0 (most recent call first):
<no Python frame>
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, yaml._yaml, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, simplejson._speedups, requests.packages.charset_normalizer.md, requests.packages.chardet.md, markupsafe._speedups, PIL._imaging, sentencepiece._sentencepiece, PIL._imagingft, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.special.cython_special, scipy.stats._stats, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.stats._unuran.unuran_wrapper, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, sklearn.__check_build._check_build, psutil._psutil_linux, psutil._psutil_posix, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, _cffi_backend, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, regex._regex (total: 151)
Aborted (core dumped)
It seems the bug is related to vllm rather than opencompass. Please check vllm for more information.
Prerequisite
Type
I have modified the code (config is not considered code), or I'm working on my own tasks/models/datasets.
Environment
Reproduces the problem - code/configuration sample
Reproduces the problem - command or script
Reproduces the problem - error message
Other information
不知道是哪里出问题了,有遇到过类似情况的吗?