Closed WizardMx closed 1 year ago
It seems this para infects inference time very much
"max_seq_len" determines the sequence length of the batch inference. Decreasing this value will greatly save the inference time, but may lead to incomplete response.
It seems this para infects inference time very much