Open ggbetz opened 3 months ago
Looks like a tricky one here, will look into where this is coming in:
2024-05-10T22:36:35.181385695Z INFO 05-10 22:36:35 selector.py:16] Using FlashAttention backend.
2024-05-10T22:36:36.885485950Z [36m(RayWorkerVllm pid=7595)[0m INFO 05-10 22:36:36 selector.py:16] Using FlashAttention backend.
2024-05-10T22:36:36.885536750Z [36m(RayWorkerVllm pid=7595)[0m INFO 05-10 22:36:36 pynccl_utils.py:45] vLLM is using nccl==2.18.1
2024-05-10T22:36:36.885543520Z INFO 05-10 22:36:36 pynccl_utils.py:45] vLLM is using nccl==2.18.1
2024-05-10T22:36:41.312005618Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] Error executing method load_model. This might cause deadlock in distributed execution.
2024-05-10T22:36:41.312037008Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] Traceback (most recent call last):
2024-05-10T22:36:41.312043168Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/ray_utils.py", line 37, in execute_method
2024-05-10T22:36:41.312049198Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] return executor(*args, **kwargs)
2024-05-10T22:36:41.312054648Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 107, in load_model
2024-05-10T22:36:41.312060448Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.model_runner.load_model()
2024-05-10T22:36:41.312065698Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 95, in load_model
2024-05-10T22:36:41.312071528Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.model = get_model(
2024-05-10T22:36:41.312098688Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 91, in get_model
2024-05-10T22:36:41.312104668Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] model = model_class(model_config.hf_config, linear_method)
2024-05-10T22:36:41.312110048Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/jais.py", line 270, in __init__
2024-05-10T22:36:41.312115687Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.transformer = JAISModel(config, linear_method)
2024-05-10T22:36:41.312120977Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/jais.py", line 230, in __init__
2024-05-10T22:36:41.312126507Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.h = nn.ModuleList([
2024-05-10T22:36:41.312132687Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/jais.py", line 231, in <listcomp>
2024-05-10T22:36:41.312138737Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] JAISBlock(config, linear_method)
2024-05-10T22:36:41.312144097Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/jais.py", line 183, in __init__
2024-05-10T22:36:41.312149707Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.mlp = JAISMLP(inner_dim, config, linear_method)
2024-05-10T22:36:41.312155027Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/jais.py", line 137, in __init__
2024-05-10T22:36:41.312160747Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.c_fc = ColumnParallelLinear(
2024-05-10T22:36:41.312165967Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/linear.py", line 173, in __init__
2024-05-10T22:36:41.312171587Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.output_size_per_partition = divide(output_size, tp_size)
2024-05-10T22:36:41.312176897Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/parallel_utils/utils.py", line 19, in divide
2024-05-10T22:36:41.312182467Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] ensure_divisibility(numerator, denominator)
2024-05-10T22:36:41.312187737Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/parallel_utils/utils.py", line 12, in ensure_divisibility
2024-05-10T22:36:41.312199477Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] assert numerator % denominator == 0, "{} is not divisible by {}".format(
2024-05-10T22:36:41.312205177Z [36m(RayWorkerVllm pid=7595)[0m ERROR 05-10 22:36:41 ray_utils.py:44] AssertionError: 13653 is not divisible by 4
2024-05-10T22:36:41.312211187Z [36m(RayWorkerVllm pid=7380)[0m INFO 05-10 22:36:36 selector.py:16] Using FlashAttention backend.[32m [repeated 2x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)[0m
2024-05-10T22:36:41.313330131Z Traceback (most recent call last):
2024-05-10T22:36:41.313366891Z File "/usr/local/bin/cot-eval", line 8, in <module>
2024-05-10T22:36:41.313526370Z sys.exit(main())
2024-05-10T22:36:41.313550730Z File "/workspace/cot-eval/src/cot_eval/__main__.py", line 149, in main
2024-05-10T22:36:41.313593179Z llm = VLLM(
2024-05-10T22:36:41.313605389Z File "/usr/local/lib/python3.10/dist-packages/langchain_core/load/serializable.py", line 120, in __init__
2024-05-10T22:36:41.313659589Z super().__init__(**kwargs)
2024-05-10T22:36:41.313672039Z File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 341, in __init__
2024-05-10T22:36:41.313752498Z raise validation_error
2024-05-10T22:36:41.313823148Z pydantic.v1.error_wrappers.ValidationError: 1 validation error for VLLM
2024-05-10T22:36:41.313833768Z __root__
2024-05-10T22:36:41.313839598Z 13653 is not divisible by 4 (type=assertion_error)
2024-05-10T22:36:43.599831555Z [36m(RayWorkerVllm pid=7380)[0m INFO 05-10 22:36:36 pynccl_utils.py:45] vLLM is using nccl==2.18.1[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.599894725Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] Error executing method load_model. This might cause deadlock in distributed execution.[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.599908335Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] Traceback (most recent call last):[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.599914535Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/ray_utils.py", line 37, in execute_method[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.599923215Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] return executor(*args, **kwargs)[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.599931985Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 95, in load_model[32m [repeated 4x across cluster][0m
2024-05-10T22:36:43.599940855Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.model_runner.load_model()[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.599977715Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.model = get_model([32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.599983425Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 91, in get_model[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.599992975Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] model = model_class(model_config.hf_config, linear_method)[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.599998554Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/linear.py", line 173, in __init__[32m [repeated 10x across cluster][0m
2024-05-10T22:36:43.600010064Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.transformer = JAISModel(config, linear_method)[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.600016424Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.h = nn.ModuleList([[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.600022014Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/jais.py", line 231, in <listcomp>[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.600032144Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] JAISBlock(config, linear_method)[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.600040194Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.mlp = JAISMLP(inner_dim, config, linear_method)[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.600045794Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.c_fc = ColumnParallelLinear([32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.600054054Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] self.output_size_per_partition = divide(output_size, tp_size)[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.600059674Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/parallel_utils/utils.py", line 19, in divide[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.600068564Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] ensure_divisibility(numerator, denominator)[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.600076354Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/parallel_utils/utils.py", line 12, in ensure_divisibility[32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.600085134Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] assert numerator % denominator == 0, "{} is not divisible by {}".format([32m [repeated 2x across cluster][0m
2024-05-10T22:36:43.600100124Z [36m(RayWorkerVllm pid=7380)[0m ERROR 05-10 22:36:41 ray_utils.py:44] AssertionError: 13653 is not divisible by 4[32m [repeated 2x across cluster][0m
Yes, might be tricky because I just tried to load core42/jais-13b-chat
on a single NVIDIA A100-SXM4-40GB and run inference with VLLM, which worked fine.
For
XX
in [13b, 13b-chat, 30b-v3, 30b-chat-v3]:Check upon issue creation:
Parameters:
ToDos: