Performance on Large Language Models with Fewer Parameters

togethercomputer / MoA

Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models

Apache License 2.0

2.51k stars 346 forks source link

Performance on Large Language Models with Fewer Parameters #38

Closed huliang2016 closed 1 month ago

huliang2016 commented 1 month ago

hi, nice work~

I'm quite interested in the performance of this approach on large language models with fewer parameters, such as using three 7B models as a proposer and a 14B model as an aggregator, and so on. Have you made any attempts in this direction?

IsThatYou commented 1 month ago

We had some experiments for stuff like this in the early stage of the project. We saw some gains but the performance highly depend on which model is the aggregator. Smaller models tend to be less capable at aggregation. Some smaller models even decrease performance. The models we used were very different from the strong small models we are seeing now of course, so results may be very different with gemma-2, llama 3.1 8b etc.

huliang2016 commented 1 month ago

thanks