hltcoe / sandle

Run a large language modeling SANDbox in your Local Environment
Other
7 stars 1 forks source link

Investigate poor performance on large prompts #58

Open ccmaymay opened 2 years ago

ccmaymay commented 2 years ago

From @nweir127

i am calling it on a prompt that is 600-1000 tokens long and giving it max_new_tokens of 36

i am calling

model.complete(text, stop_strings=['QUESTION'], top_p=0.8, num_return_sequences=1)

Taking 100+ seconds per call on 8 GPUs