T5 cross_attention output cannot be accessed - Githubissues

triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License

411 stars 133 forks source link

T5 cross_attention output cannot be accessed #81

Open JustinAWei opened 1 year ago

JustinAWei commented 1 year ago

Description

As defined in the fastertransformers T5 guide there is an output value for cross_attentions. I cannot find any way of returning cross_attentions on fastertransformers Triton backend for T5.

For reference:

Output of T5 Decoding

Name	Tensor/Parameter Shape	Location	Data Type	Description
output_ids	[batch_size, beam_width, max_output_seq_len]	GPU	int	The output ids. It contains the input_ids and generated ids
sequence_length	[batch_size, beam_width]	GPU	int	The lengths of output ids
output_log_probs	[batch_size, beam_width, request_output_seq_len]	GPU	float	Optional. It records the log probability of logits at each step for sampling.
cum_log_probs	[batch_size, beam_width]	GPU	float	Optional. Cumulative log probability of generated sentences
cross_attentions	[num_layer / pipeline_para_size, batch_size, beam_width, head_num / tensor_para_size, max_seq_len, mem_max_seq_len]	GPU	float	Optional. The attention scores of cross attention

byshiue commented 1 year ago

The API is not exposed in triton backend yet, that's why you cannot find the output in the t5 document of triton backend.