Closed Moses0h closed 1 year ago
For demo, if you run batch generation, you will need to wait for inference to complete so the user experience is not good. You can use this code for batch generation.
Closing it for now and will add batch inference option in our to-do list.
writing here because related,
I'm currently using 0 as padding token (correct me if I'm wrong), left-padding, padding="max_length" However, the more the padding, the generated output becomes more and more weird.
for example, inputs_ids [0,0,0,0,..... 23,5,143,24, etc...] has weird generated outputs and just [23, 5, 143, 24, etc...] works normally.
I'm trying to run batch generation but greedy_search() seems to work only for single input_ids. Curious why you guys implemented greedy_search()?