Open loadams opened 3 weeks ago
Updating to transformers versions beyond v4.43.4 causes issues with the CI tests in the legacy mode. The bloom tests fail with:
FAILED test_non_persistent_deployment.py::test_single_GPU[None-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query3-non-persistent] - ValueError: not enough values to unpack (expected 2, got 0) FAILED test_local_deployment.py::test_session[None-local-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query0] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with: FAILED test_local_deployment.py::test_multi_GPU[None-local-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query0] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with: FAILED test_local_deployment.py::test_single_GPU[None-local-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query3] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with: FAILED test_deployment_options.py::test_meta_tensor[query0-None-bigscience/bloom-560m-local-50050-False-28080-text-generation-fp16-False-1-True-False-ds_config0-2-True] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with: FAILED test_deployment_options.py::test_load_to_sys_mem[query0-None-bigscience/bloom-560m-local-50050-False-28080-text-generation-fp16-1-False-1-True-False-ds_config0-True] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with: FAILED test_deployment_options.py::test_restful_api[query0-28080-None-bigscience/bloom-560m-local-50050-text-generation-fp16-1-False-False-1-True-False-ds_config0-True] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with: FAILED test_deployment_options.py::test_replicas[query0-None-bigscience/bloom-560m-local-50050-False-28080-text-generation-fp16-1-False-False-True-False-ds_config0-2] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
We have isolated the problematic commit to this one: https://github.com/huggingface/transformers/pull/31445
../../mii/legacy/client.py:144: in query return task_methods.run_inference(inference_pipeline, args, query_kwargs) ../../mii/legacy/method_table.py:101: in run_inference response = inference_pipeline(*args, **kwargs) ../../../venv/lib/python3.12/site-packages/transformers/pipelines/text_generation.py:262: in __call__ return super().__call__(text_inputs, **kwargs) ../../../venv/lib/python3.12/site-packages/transformers/pipelines/base.py:1238: in __call__ outputs = list(final_iterator) ../../../venv/lib/python3.12/site-packages/transformers/pipelines/pt_utils.py:124: in __next__ item = next(self.iterator) ../../../venv/lib/python3.12/site-packages/transformers/pipelines/pt_utils.py:125: in __next__ processed = self.infer(item, **self.params) ../../../venv/lib/python3.12/site-packages/transformers/pipelines/base.py:1164: in forward model_outputs = self._forward(model_inputs, **forward_params) ../../../venv/lib/python3.12/site-packages/transformers/pipelines/text_generation.py:351: in _forward generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs) ../../../venv/lib/python3.12/site-packages/deepspeed/inference/engine.py:631: in _generate return self.module.generate(*inputs, **kwargs) ../../../venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:116: in decorate_context return func(*args, **kwargs) ../../../venv/lib/python3.12/site-packages/transformers/generation/utils.py:2024: in generate result = self._sample( ../../../venv/lib/python3.12/site-packages/transformers/generation/utils.py:2982: in _sample outputs = self(**model_inputs, return_dict=True) ../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl return self._call_impl(*args, **kwargs) ../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl return forward_call(*args, **kwargs) ../../../venv/lib/python3.12/site-packages/transformers/models/bloom/modeling_bloom.py:955: in forward transformer_outputs = self.transformer( ../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl return self._call_impl(*args, **kwargs) ../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl return forward_call(*args, **kwargs) ../../../venv/lib/python3.12/site-packages/transformers/models/bloom/modeling_bloom.py:744: in forward outputs = block( ../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl return self._call_impl(*args, **kwargs) ../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl return forward_call(*args, **kwargs) ../../../venv/lib/python3.12/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py:162: in forward self.attention(input, ../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl return self._call_impl(*args, **kwargs) ../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl return forward_call(*args, **kwargs) ../../../venv/lib/python3.12/site-packages/deepspeed/ops/transformer/inference/ds_attention.py:168: in forward context_layer, key_layer, value_layer = self.compute_attention(qkv_out=qkv_out,
Updating to transformers versions beyond v4.43.4 causes issues with the CI tests in the legacy mode. The bloom tests fail with:
We have isolated the problematic commit to this one: https://github.com/huggingface/transformers/pull/31445