Seperate models for each agent

R-Dson commented 7 months ago

Nice project. Open source models are usually good at one task, such as coding, writing, etc. And it would be interesting if we could set a parameter in the .env to specify which model that agent should use.

As an example, this could be done by adding a variable in .env such as ENGINEER_MODEL_NAME="gemma:7b-instruct-v1.1-fp16", and the same for the others agents. If none is set then fall back to the default. And if you run jemma --prompt "Trivia Game" --build-prototype --ollama dolphin-mistral:7b-v2.6-dpo-laser-fp16 with ENGINEER_MODEL_NAME set in .env to gemma then engineer is using gemma and all other use dolphin-mistral.

This could potentially perform better than using only one local model, at the expense of time switching models, thus should be optional to use.

tolitius commented 7 months ago

great idea! was trying to figure out how to mix models better

I tried to generate requirements from Claude:

jemma --prompt "Trivia Game" --build-prototype --claude

and then write code with Ollama (codegemma, deepseek coder, etc.)

jemma --requirements requirements/learning-portal.2024-04-11.16-11-56-382.txt --build-prototype --ollama codegemma:7b-instruct-fp16

but so far I think local instruct / code models are trained to take in short instructions vs. something like detailed requirements and generate short passages of code

I am sure it will change for the better in a very near future and also I am looking at meta prompting it better

as to your idea, I think model should be assigned to an activity vs. a role for example

a business owner can:

generate requirements from idea
generate user stories from idea
generate user stories from requirements
analyze a visual sketch (trying it right now) to generate requirements

an engineer can:

review a business story (language task)
implement a user story
refactor code
etc..

your idea still applies, but I think it makes sense to assign a model for each activity and if unassigned, use the top level model:

[
 {"task": "idea-to-requirements",
  "model": "claude-3-haiku-20240307",
  "provider": "claude"},

 {"task": "review-user-story",
  "model": "mistral:7b-instruct-v0.2-fp16",
  "provider": "ollama"},

 {"task": "refactor-code",
  "model": "deepseek-coder:6.7b-instruct-fp16",
  "provider": "ollama"},
]

R-Dson commented 7 months ago

Yes that seems like a reasonable approach.

As for the Ollama and short instructions, you might need to add the parameter num_ctx to options and increase the context window. I believe the default is 2048.

tolitius commented 7 months ago

yep, updated per docs

R-Dson commented 7 months ago

Nice, you could include context length in this part since you may want a 7b model to use a longer length than a 33b model, etc.


 {"task": "idea-to-requirements",
  "model": "claude-3-haiku-20240307",
  "provider": "claude"},

 {"task": "review-user-story",
  "model": "mistral:7b-instruct-v0.2-fp16",
  "provider": "ollama",
  "context": "16384"},

 {"task": "refactor-code",
  "model": "deepseek-coder:33b-instruct-fp16",
  "provider": "ollama",
  "context": "4096"},
]

tolitius / jemma

Seperate models for each agent #5