meta-llama / llama

Inference code for Llama models
Other
54.17k stars 9.32k forks source link

Doubt with memory and API usage #802

Open Abc11c opened 9 months ago

Abc11c commented 9 months ago

Hi, Thanks for the code!

  1. How to free up memory after 1 cycle of inference and avoid running into out of memory issues

  2. I was previously testing out chat completion style of tasks in openai apis

      result = openai.ChatCompletion.create(
        messages=[
              {"role": "system", "content": "You an assistant."},
              {"role": "user", "content": user_prompt_1},
              {"role": "assistant", "content": assistent_feed},
              {"role": "user", "content": user_prompt_2}
          ]
      )

I'm looking at the example_chat_completion.py for a similar api can you suggest how to get something implemented like the above

How do I setup a role for the system the equivalent of custom instructions How would you like ChatGPT to respond? What would you like ChatGPT to know about you to provide better responses?

Thanks!

haozhuang0000 commented 9 months ago

Same question!! Anyone helps?

HamidShojanazeri commented 9 months ago

@Abc11c I think it might be easier to work it out with HF conversational pipeline or decouple its components as suits you.