Question: the response I got by using terminal is way better than using ollama.generate

wangyeye66 commented 2 months ago

I use llama2 7b to for text generation. The prompt I attampted: """Task: Turn the input into (subject, predicate, object). Input: Sam Johnson is eating breakfast. Output: (Dolores Murphy, eat, breakfast) Input: Joon Park is brewing coffee. Output: (Joon Park, brew, coffee) Input: Jane Cook is sleeping. Output: (Jane Cook, is, sleep) Input: Michael Bernstein is writing email on a computer. Output: (Michael Bernstein, write, email) Input: Percy Liang is teaching students in a classroom. Output: (Percy Liang, teach, students) Input: Merrie Morris is running on a treadmill. Output: (Merrie Morris, run, treadmill) Input: John Doe is drinking coffee. Output: (John Doe,"""

using ollama.generate will generate a chat rather than keep generating the text. In terminal, it seems understand what I would like to do. Did I call wrong function in python? How can I let the model know I don't need a chat-like response?

93andresen commented 2 months ago

I've never tried this libary, but maybe "ollama.chat" works like the terminal amd "ollama.generate" is like autocomplete?

ioo0s commented 4 weeks ago

I have the same problem, and the results I get from running it through the ollama run xxmodel terminal are much better than the results I get from python sdk client.chat. Why?

BowenKwan commented 2 weeks ago

Same problem here. Using ollama run custom_model in the terminal gives a much better result than ollama.chat(model='custom_model.

It seems to me that all the few shot example provided in the modelfile used to train the custom_model is not provided to the custom model when using ollama.chat. The result seems to be just like using the base model that the custom model is trained on.

mxyng commented 2 weeks ago

@wangyeye66 can you paste the output you get from the cli and the output from the ollama.chat?

from what I can tell, this behavior is expected. llama2:7b implements a chat template which uses these messages to simulate a user/assistant exchange. this is regardless of what method is used to interact with the llm, cli, ollama.generate, or ollama.chat. here's (roughly) what your prompt will produce as an input to the llm:

[INST] <<SYS>><</SYS>> Task: Turn the input into (subject, predicate, object).
Input: Sam Johnson is eating breakfast.
Output: (Dolores Murphy, eat, breakfast)
Input: Joon Park is brewing coffee.
Output: (Joon Park, brew, coffee)
Input: Jane Cook is sleeping.
Output: (Jane Cook, is, sleep)
Input: Michael Bernstein is writing email on a computer.
Output: (Michael Bernstein, write, email)
Input: Percy Liang is teaching students in a classroom.
Output: (Percy Liang, teach, students)
Input: Merrie Morris is running on a treadmill.
Output: (Merrie Morris, run, treadmill)
Input: John Doe is drinking coffee.
Output: (John Doe, [/INST]

based on your prompt, you're probably more interested in the text completion model, llama2:7b-text, which does not template the input

mxyng commented 2 weeks ago

@BowenKwan your issue appears different so I'll respond in #188

ollama / ollama-python

Question: the response I got by using terminal is way better than using ollama.generate #117