Closed orangetin closed 6 months ago
Multiple customers are getting confused that using `prompt` through the together python library uses the raw prompt since the preferred way to do it through the REST API and OpenAI package is using `messages` which adds prompt formatting. Therefore, we want to support `messages` so that all 3 ways of using our inference API are consistent. More context here: [https://www.notion.so/together-docs/Prompt-template-discrepancy-proposal-a557d4fb7f5d49d59a9b79480e0926b9](https://www.notion.so/together-docs/Prompt-template-discrepancy-proposal-a557d4fb7f5d49d59a9b79480e0926b9)
here is my async demo
import os
import time
from together import Together
TOGETHER_API_KEY = os.getenv('TOGETHER_API_KEY')
def sync_chat_completion(messages, max_tokens):
client = Together(api_key=TOGETHER_API_KEY)
start_time = time.time()
for message in messages:
response = client.chat.completions.create(
model="togethercomputer/llama-2-7b-chat",
max_tokens=max_tokens,
messages=[{"role": "user", "content": message}]
)
print(response.choices[0].message.content)
end_time = time.time()
print("Synchronous total execution time:", end_time - start_time, "seconds")
async def async_chat_completion(messages, max_tokens):
async_client = AsyncTogether(api_key=TOGETHER_API_KEY)
start_time = time.time()
tasks = [async_client.chat.completions.create(
model="togethercomputer/llama-2-7b-chat",
max_tokens=max_tokens,
messages=[{"role": "user", "content": message}]
) for message in messages]
responses = await asyncio.gather(*tasks)
for response in responses:
print(response.choices[0].message.content)
end_time = time.time()
print("Asynchronous total execution time:", end_time - start_time, "seconds")
in jupyter notebook
messages = ["hi there what is the meaning of life?", "What country is Paris in?"]
sync_chat_completion(messages, 32)
await async_chat_completion(messages, 32)
otherwise
messages = ["hi there what is the meaning of life?", "What country is Paris in?"]
sync_chat_completion(messages, 32)
asyncio.run(async_chat_completion(messages, 32))
expected output
The meaning of life is a question that has puzzled philosophers, theologians, and scientists for centuries. There are many different perspectives
Paris is located in France. It is the capital and largest city of France, situated in the northern central part of the country.
Synchronous total execution time: 0.7738921642303467 seconds
The meaning of life is a question that has puzzled philosophers, theologians, and scientists for centuries. There are many different perspectives
Paris is located in France. It is the capital and largest city of France, situated in the northern central part of the country.
Asynchronous total execution time: 0.4429478645324707 seconds
To-Do:
Endpoints to support:
Example usage:
Updated contribution style
Setting up pre-commit for dev: