Open wangyeye66 opened 2 months ago
I've never tried this libary, but maybe "ollama.chat" works like the terminal amd "ollama.generate" is like autocomplete?
I have the same problem, and the results I get from running it through the ollama run xxmodel
terminal are much better than the results I get from python sdk client.chat
. Why?
Same problem here. Using ollama run custom_model
in the terminal gives a much better result than ollama.chat(model='custom_model
.
It seems to me that all the few shot example provided in the modelfile used to train the custom_model is not provided to the custom model when using ollama.chat
. The result seems to be just like using the base model that the custom model is trained on.
@wangyeye66 can you paste the output you get from the cli and the output from the ollama.chat
?
from what I can tell, this behavior is expected. llama2:7b implements a chat template which uses these messages to simulate a user/assistant exchange. this is regardless of what method is used to interact with the llm, cli
, ollama.generate
, or ollama.chat
. here's (roughly) what your prompt will produce as an input to the llm:
[INST] <<SYS>><</SYS>> Task: Turn the input into (subject, predicate, object).
Input: Sam Johnson is eating breakfast.
Output: (Dolores Murphy, eat, breakfast)
Input: Joon Park is brewing coffee.
Output: (Joon Park, brew, coffee)
Input: Jane Cook is sleeping.
Output: (Jane Cook, is, sleep)
Input: Michael Bernstein is writing email on a computer.
Output: (Michael Bernstein, write, email)
Input: Percy Liang is teaching students in a classroom.
Output: (Percy Liang, teach, students)
Input: Merrie Morris is running on a treadmill.
Output: (Merrie Morris, run, treadmill)
Input: John Doe is drinking coffee.
Output: (John Doe, [/INST]
based on your prompt, you're probably more interested in the text completion model, llama2:7b-text, which does not template the input
@BowenKwan your issue appears different so I'll respond in #188
I use llama2 7b to for text generation. The prompt I attampted: """Task: Turn the input into (subject, predicate, object). Input: Sam Johnson is eating breakfast. Output: (Dolores Murphy, eat, breakfast) Input: Joon Park is brewing coffee. Output: (Joon Park, brew, coffee) Input: Jane Cook is sleeping. Output: (Jane Cook, is, sleep) Input: Michael Bernstein is writing email on a computer. Output: (Michael Bernstein, write, email) Input: Percy Liang is teaching students in a classroom. Output: (Percy Liang, teach, students) Input: Merrie Morris is running on a treadmill. Output: (Merrie Morris, run, treadmill) Input: John Doe is drinking coffee. Output: (John Doe,"""
using ollama.generate will generate a chat rather than keep generating the text. In terminal, it seems understand what I would like to do. Did I call wrong function in python? How can I let the model know I don't need a chat-like response?