As defined in the fastertransformers T5 guide there is an output value for cross_attentions. I cannot find any way of returning cross_attentions on fastertransformers Triton backend for T5.
For reference:
Output of T5 Decoding
Name
Tensor/Parameter Shape
Location
Data Type
Description
output_ids
[batch_size, beam_width, max_output_seq_len]
GPU
int
The output ids. It contains the input_ids and generated ids
sequence_length
[batch_size, beam_width]
GPU
int
The lengths of output ids
output_log_probs
[batch_size, beam_width, request_output_seq_len]
GPU
float
Optional. It records the log probability of logits at each step for sampling.
cum_log_probs
[batch_size, beam_width]
GPU
float
Optional. Cumulative log probability of generated sentences
Description
As defined in the fastertransformers T5 guide there is an output value for
cross_attentions
. I cannot find any way of returningcross_attentions
on fastertransformers Triton backend for T5.For reference: