huggingface / optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Apache License 2.0
143 stars 176 forks source link

Adaptive output and contextual dialogue capabilities of text-generation-inference #424

Open MLikeWater opened 11 months ago

MLikeWater commented 11 months ago

System Info

System Info
HL-SMI Version: hl-1.11.0-fw-45.1.1.1
Driver Version: 1.11.0-e6eb0fd

Information

Tasks

Reproduction

Deploy the Llama-2-7b-chat-hf model through text-generation-inference, but there is no adaptive output when using the following command, instead the input and output size are max_new_tokens.

curl 127.0.0.1:8080/generate_stream -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":200}}'     -H 'Content-Type: application/json'

Also, how to implement chat functionality with context? Similar to GPT4, it can adaptively output appropriate content and has the ability to dialogue with context.

Expected behavior

  1. adaptive output
  2. dialogue with context
regisss commented 11 months ago

@MLikeWater What do you mean exactly by adaptive output?