Open ccmaymay opened 2 years ago
From @nweir127
i am calling it on a prompt that is 600-1000 tokens long and giving it max_new_tokens of 36 i am calling model.complete(text, stop_strings=['QUESTION'], top_p=0.8, num_return_sequences=1)
i am calling it on a prompt that is 600-1000 tokens long and giving it max_new_tokens of 36
i am calling
model.complete(text, stop_strings=['QUESTION'], top_p=0.8, num_return_sequences=1)
Taking 100+ seconds per call on 8 GPUs
From @nweir127
Taking 100+ seconds per call on 8 GPUs