Closed zpcore closed 5 months ago
are we using nightly HF as well?
are we using nightly HF as well?
No, Torchbench is using transformers==4.38. We use the same config.
The failure is related to this PR: https://github.com/pytorch/xla/pull/6792. Here is the chat from @JackCaoG :
@Jiewen Tan I think we need to think about how to support flash_attenion with dynamo. It seems like if user put your flash_attention wrapper(https://github.com/pytorch/xla/blob/master/torch_xla/experimental/custom_kernel.py#L147-L185) inside the torch.compile region and dynamo will try to step through the JAX python code and eventually failed.
Based on your original pr https://github.com/pytorch/xla/pull/6477, do you expect user to call flash_attenion outside of torch.compile and extract the payload and use that in the torch.compile region?
Waiting for pending fix fro @alanwaketan.
Sorry, double checked that the failure should be due to the changes in torchbench. Let me confirm which PR and will make an update.
The issue is related to the changes in the torchbench upstream https://github.com/pytorch/benchmark/pull/2197.
In torchbenchmark/util/framework/huggingface/model_factory.py,
def get_module(self):
return self.model, self.example_inputs
return item self.example_inputs becomes a dict instead of the list of tensor.
🐛 Bug
Torchbench models like hf_Albert, hf_Bart, hf_Bert, hf_Bert_large, hf_BigBird, hf_DistilBert, hf_GPT2, hf_GPT2_large, hf_Longformer, hf_Reformer, hf_T5, hf_T5_base, hf_T5_generate, hf_T5_large all failed recently wit hte
To Reproduce
Steps to reproduce the behavior:
1.
Error log:
from user code: File "/tmp/xla/benchmarks/benchmark_model.py", line 170, in eval pred = self.module(inputs) File "/root/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, *kwargs) File "/usr/local/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1360, in forward outputs = self.bert( File "/root/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, **kwargs) File "/usr/local/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 961, in forward input_shape = input_ids.size()
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True
ERROR:main:ERROR in subprocess INFO:main:Run with --model-config={"model_name": "hf_Bert_large"} --experiment-config={"accelerator": "cuda", "xla": "PJRT", "xla_flags": null, "dynamo": null, "test": "eval"} WARNING:main:Enabling fast F32 multiplication for PyTorch WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1711990872.886326 2060 service.cc:145] XLA service 0x55c4d7299290 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: I0000 00:00:1711990872.886425 2060 service.cc:153] StreamExecutor device (0): NVIDIA A100-SXM4-40GB, Compute Capability 8.0 I0000 00:00:1711990872.886901 2060 se_gpu_pjrt_client.cc:853] Using BFC allocator. I0000 00:00:1711990872.886984 2060 gpu_helpers.cc:107] XLA backend allocating 31724126208 bytes on device 0 for BFCAllocator. I0000 00:00:1711990872.887012 2060 gpu_helpers.cc:147] XLA backend will use up to 10574708736 bytes on device 0 for CollectiveBFCAllocator. Traceback (most recent call last): File "xla/benchmarks/experiment_runner.py", line 945, in
main()
File "xla/benchmarks/experiment_runner.py", line 941, in main
runner.run()
File "xla/benchmarks/experiment_runner.py", line 61, in run
self.run_single_config()
File "xla/benchmarks/experiment_runner.py", line 256, in run_single_config
metrics, last_output = self.run_once_and_gather_metrics(
File "xla/benchmarks/experiment_runner.py", line 345, in run_once_and_gathermetrics
output, = loop(iter_fn=self._default_iter_fn)
File "xla/benchmarks/experiment_runner.py", line 302, in loop
output, timing, trace = iter_fn(benchmark_experiment, benchmark_model,
File "xla/benchmarks/experiment_runner.py", line 218, in _default_iter_fn
output = benchmark_model.model_iter_fn(
File "/tmp/xla/benchmarks/benchmark_model.py", line 170, in eval
pred = self.module(inputs)
File "/root/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/root/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, *kwargs)
File "/usr/local/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1360, in forward
outputs = self.bert(
File "/root/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/root/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 960, in forward
self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
File "/usr/local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 4169, in warn_if_padding_and_no_attention_mask
if self.config.pad_token_id in input_ids[:, [-1, 0]]:
TypeError: string indices must be integers