Closed Quang-elec44 closed 7 months ago
Thanks for opening this issue @Quang-elec44
The returned output is a tuple with two tensors:
(batch, num_beams, sequence_length)
(batch, num_beams)
We will provide better documentation soon on this
Hope it helps :)
Why is the return value implemented as tuple
though? This discrepancy in the returned shape from transformers.generate()
(i.e. tuple
vs torch.Tensor
) prevents me from reusing the exact same code for both models during inference. I would like to know if there are specific reasons for that. Thank you!
Hi, I'm testing the latest version with
TinyLlama/TinyLlama-1.1B-Chat-v0.3
. Here is the full script:The generated_ids is a tuple with two values like this:
The output is ok:
Can you explain the output? BTW: Can you provide a document for generation_config since it's different from HF (some params are not supported)