Open chrsmcgrr opened 3 days ago
I already have a pragmatic solution and would like to get feedback on it.
I would replace the following line in the model:
hidden_states[~expand_attention_mask] = 0
with
hidden_states = hidden_states*expand_attention_mask.to(hidden_states.dtype)
This avoids the issue in pytorch. Though the real solution does lie within pytorch. But I have yet to create a small reproducer.
For now this change will unblock the model. I will open a PR shortly.
System Info
transformers
version: 4.46.0.dev0Who can help?
@ylacombe, @eustlb
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Running the following script:
Causes the following error:
Expected behavior
torch.export
completes without raising an exception