Open tednas opened 5 months ago
Are you sure the prompt format is being correctly applied?
system
You are a very helpful AI assistant!<|im_end|>
user
Extract all the edible items from the given context below.
Then categorize these extracted items using the Food Pyramid as a guide like "Pizza: Carbohydrate" or "Salad: Vegetable"
Only list them without any extra explanation.
Context=This is my ideal Thanksgiving dinner, a roasted turkey with stuffing and mashed potatoes on the side plus a hearty salad and plenty of cranberry sauce<|im_end|>
assistant
Where did the <|im_start|>
tags go? Also you probably want an extra \n
at the end of the prompt.
It's normal for the speed to decrease as you build up a context. Seems to be dropping a little quicker than I'd expect, though. If you're on Linux, installing flash-attn can help a bunch with that, but it's a little harder to get working on Windows.
Thanks @turboderp for the fast response, your maintenance is awesome
You are right about the prompt template, that's why I started with chat.py since everything is applied properly within the code.
Now I am using the format according to one of your previous response from other issues,
Although <|im_start|>
tag is there before calling the generator, but in the response, I do not see them
#prompt = f"<|im_start|>system\n{system_prompt}<|im_end|>\n<|im_start|>user\n{user_prompt}<|im_end|>\n<|im_start|>assistant\n"
def format(prompt, response, system_prompt, settings):
text = ""
if system_prompt and system_prompt.strip() != "":
text += "<|im_start|>system\n"
text += system_prompt
text += "\n<|im_end|>\n"
text += "<|im_start|>user\n"
text += prompt
text += "<|im_end|>\n"
text += "<|im_start|>assistant\n"
if response:
text += response
text += "<|im_end|>\n"
return text
prompt = format(user_prompt, None, system_prompt, settings)
Response
******************************
Log for prompt before calling generator
<|im_start|>system
You are a very helpful AI assistant!
<|im_end|>
<|im_start|>user
Extract all the edible items from the given context below.
Then categorize these extracted items using the Food Pyramid as a guide like "Pizza: Carbohydrate" or "Salad: Vegetable"
Only list them without any extra explanation.
Context=This is my ideal Thanksgiving dinner, a roasted turkey with stuffing and mashed potatoes on the side plus a hearty salad and plenty of cranberry sauce<|im_end|>
<|im_start|>assistant
******************************
system
You are a very helpful AI assistant!
<|im_end|>
user
Extract all the edible items from the given context below.
Then categorize these extracted items using the Food Pyramid as a guide like "Pizza: Carbohydrate" or "Salad: Vegetable"
Only list them without any extra explanation.
Context=This is my ideal Thanksgiving dinner, a roasted turkey with stuffing and mashed potatoes on the side plus a hearty salad and plenty of cranberry sauce<|im_end|>
assistant
Here's your categorized list:
- Roasted turkey: Protein
- Stuffing: Carbohydrate
- Mashed potatoes: Carbohydrate
- Hearty salad: Vegetable
- Cranberry sauce: Fruit
Remember, the Food Pyramid can vary by culture and country. This is just an example. Please consult a nutritionist for accurate information. Always consult a doctor before making significant changes to your diet. The Food Pyramid is a general guideline, and personal dietary needs may vary. Items can be classified differently based on specific nutritional requirements. Please consider these factors when using this information. This categorization is just for reference purposes only. Always consult a professional for dietary advice. This response is not intended to replace professional medical advice. Always consult a healthcare provider before making major changes to your diet. Different cultures and regions have different food pyramids so it's best to consult a local dietitian or nutritionist for accurate information. This categorization is intended only as a general guideline and should not replace the advice of a qualified healthcare professional. Always consult a health professional before making significant changes to
Response generated in 5.20 seconds, 250 tokens, 48.05 tokens/second
@turboderp need some advice to focus on the right direction:
inference.py
or chat.py
For evaluating many independent prompts I would consider batching. Depending on how much VRAM you have and how long the prompts and replies end up being, you could evaluate tens to hundreds at once.
It should also be possible to get the same output from inference.py. There's likely just some subtle difference in how the template is applied or how it's encoded, or possibly the sampling parameters are a little different.
As for the speed of chat.py, the reason it slows down is that it's accumulating a context, but that's not relevant if your queries are all independent. You can just reset the context. For chat.py the command line argument would be --amnesia
That's great, thank you so much for the details response :) Gonna test them
Hi @turboderp, thanks for the great tool.
For a use case of 1000 prompts, I am experimenting with two scripts:
chat.py
inference.py
using model: dolphin-2.6-mistral-7B-GPTQ args.mode = "chatml"
While the chat.py producing great quality response, by increasing number of prompts iterates, the response time is reducing drastically, below I have provided the response time.
Observation1: Resetting cache using
cache.current_seq_len = 0
has no impact Observation2: VRAM memory usage is stable ~9 GB about 60% usageResponse:
Part 2 of Question While I can get great response from chat.py as:
Using
inference.py
and the same chatml prompt template the result is not consice at all:Result
@turboderp any recommendations is greatly appreciated :)