Closed federicoruggeri closed 2 years ago
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hey @federicoruggeri,
If I remember correctly, TF LED cannot be compiled with output_attention_mask=True
. This is a difficult bug and I don't think we'll be able to allocate time soon to solve this, I'm afraid :-/
If you'd be willing to open a PR and dig deeper into this, I'm happy to help however!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Environment info
transformers
version: 4.4.2Who can help
@patrickvonplaten @Rocketknight1
Information
Model I am using (Bert, XLNet ...): 'allenai/led-base-16384' via AutoModelForSeq2SeqLM
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
@tf.function def test_gradient(inputs): with tf.GradientTape() as tape: led_output = led.call( input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'], labels=inputs['labels'], global_attention_mask=inputs['global_attention_mask'] if 'global_attention_mask' in inputs else None, training=True, use_cache=False, return_dict=True, output_hidden_states=True)
@tf.function def test_model(inputs): led_output = led.call( input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'], labels=inputs['labels'], global_attention_mask=inputs['global_attention_mask'] if 'global_attention_mask' in inputs else None, training=True, use_cache=False, return_dict=True, output_hidden_states=True)
preloaded_name = 'allenai/led-base-16384' led = TFAutoModelForSeq2SeqLM.from_pretrained(preloaded_name, from_pt=True)
""" In this example, we have the following shapes: input_length --> 1800 output_length --> 70 """ inputs = np.load('inputs_with_mask.npy', allow_pickle=True).item()
print('Inputs...') for key, value in inputs.items(): print('Key: {0} - Value: {1}'.format(key, value.shape))
""" Prints: input_ids - Value: (1, 1800) attention_mask - Value: (1, 1800) global_attention_mask - Value: (1, 1800) labels - Value: (1, 70) """
Test with gradient tape
led_output = test_gradient(inputs=inputs)
Test without gradient tape
led_output = test_model(inputs=inputs)
preloaded_name = 'allenai/led-base-16384' led = AutoModelForSeq2SeqLM.from_pretrained(preloaded_name)
"""
NOTE: same inputs as in the TensorFlow example!
In this example, we have the following shapes: input_length --> 1800 output_length --> 70 """ inputs = np.load('inputs_with_mask.npy', allow_pickle=True).item() inputs = {key: torch.Tensor(value.numpy()).long() for key, value in inputs.items()}
print('Inputs...') for key, value in inputs.items(): print('Key: {0} - Value: {1}'.format(key, value.shape))
""" Prints: input_ids - Value: (1, 1800) attention_mask - Value: (1, 1800) global_attention_mask - Value: (1, 1800) labels - Value: (1, 70) """
Uncomment to set model training mode
led.train()
led_output = led( input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'], labels=inputs['labels'], global_attention_mask=inputs['global_attention_mask'] if 'global_attention_mask' in inputs else None, use_cache=False, return_dict=True, output_hidden_states=True)
Uncomment to test with gradient
led_output['loss'].backward()
optim = torch.optim.SGD(led.parameters(), lr=1e-2, momentum=0.9)
optim.step()