Closed huliang2016 closed 1 month ago
We had some experiments for stuff like this in the early stage of the project. We saw some gains but the performance highly depend on which model is the aggregator. Smaller models tend to be less capable at aggregation. Some smaller models even decrease performance. The models we used were very different from the strong small models we are seeing now of course, so results may be very different with gemma-2, llama 3.1 8b etc.
thanks
hi, nice work~
I'm quite interested in the performance of this approach on large language models with fewer parameters, such as using three 7B models as a proposer and a 14B model as an aggregator, and so on. Have you made any attempts in this direction?