Closed jxchenus closed 3 weeks ago
Pasting the discussion thread here:
Regarding the trtllm backend output: We’re using the two outputs from the config directly output_ids and sequence_length . The input request_output_len is required (A value of 0 or -1 is not valid). Our input has length of 37. [ 3 108 27173 4435 44698 414 409 3812 423 4 125000 146304 146305 146306 146307 146308 146309 146310 146311 146312 146313 146314 146315 146316 146317 146318 146319 146320 146321 146322 146323 146324 146325 146326 146327 146328 5 ] The expect output is 108 468 109 with 109 being the END_ID. When we use request_output_len=8 . We got this output: output_ids: [[ 3 108 27173 4435 44698 414 409 3812 423 4 125000 146304 146305 146306 146307 146308 146309 146310 146311 146312 146313 146314 146315 146316 146317 146318 146319 146320 146321 146322 146323 146324 146325 146326 146327 146328 5 108 468 109 108 1539 109 108 109]] sequence_length: [45] The biggest problem here is that generation continues after seeing the first END_ID 109 and ends only until it reaches request_output_len . For the sequence_length , we kind of expect it to be 40 (37+3) instead of 45 (37+8).
ah I see what you meant now. I also observe if I passed in input token ids directly, it will always generate "request_seq_len" many of output token ids but when I pass in non token ids directly (like a text prompt), it ends without aligning with "reqest_seq_len" length.
When using
tensorrtllm
backend, the value insequence_length
output tensor is always the sum ofinput_lengths
andrequest_output_len
, and does not reflect the position of end_id token.In contract, when using python backend, if we specify
output_sequence_lengths
to true, the value insequence_lengths
output tensor reflects the position of the first end_id token.