triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License
411 stars 133 forks source link

T5 cross_attention output cannot be accessed #81

Open JustinAWei opened 1 year ago

JustinAWei commented 1 year ago

Description

As defined in the fastertransformers T5 guide there is an output value for cross_attentions. I cannot find any way of returning cross_attentions on fastertransformers Triton backend for T5.

For reference:

Name Tensor/Parameter Shape Location Data Type Description
output_ids [batch_size, beam_width, max_output_seq_len] GPU int The output ids. It contains the input_ids and generated ids
sequence_length [batch_size, beam_width] GPU int The lengths of output ids
output_log_probs [batch_size, beam_width, request_output_seq_len] GPU float Optional. It records the log probability of logits at each step for sampling.
cum_log_probs [batch_size, beam_width] GPU float Optional. Cumulative log probability of generated sentences
cross_attentions [num_layer / pipeline_para_size, batch_size, beam_width, head_num / tensor_para_size, max_seq_len, mem_max_seq_len] GPU float Optional. The attention scores of cross attention
byshiue commented 1 year ago

The API is not exposed in triton backend yet, that's why you cannot find the output in the t5 document of triton backend.