Open yananchen1989 opened 3 months ago
here i paste one example, for your information. In gsm8k, one of the examples' oracle answer is
Since each apple costs $4, it means that on Monday, Lucy picked 56/4 = <<56/4=14>>14 apples
Combining Monday and Tuesday, Lucy picked 14 + 12 = <<14+12=26>>26 apples
On Wednesday, she picked 2 * 12 = <<2*12=24>>24 apples
All together, Lucy picked 26 + 24 = <<26+24=50>>50 apples.
#### 50
And llama-3.1-8B's answer is (which is correct)
On Monday, Lucy sold 56/4 = 14 apples.
On Tuesday, she picked 12 apples.
On Wednesday, she picked double the number of apples she did the previous day, which is 2*12 = 24 apples.
Total apples picked over the three days = 14 + 12 + 24 = 50 apples.
#### 50
but when applying MoA, llama-3.1-8B not only takes the originla question, but also three answers from the inter layer, here mistal-v0.1's answer is (which is wrong)
On Monday, Lucy sold all the apples picked, so she picked 12 apples on Tuesday and 24 apples on Wednesday.
The total number of apples picked over the three days is 12 + 24 + 24 = <<12+24+24=60>>60 apples.
The total amount of money Lucy got from selling the apples picked over the three days is $4/piece \* 60 pieces = $<<4*60=240>>240.
The total amount of money Lucy got from selling the apples picked on Monday is $56.
Therefore, the total amount of money Lucy got from selling the apples picked over the three days is $56 + $240 = $<<56+240=296>>296.
#### 296
mistal-v0.2's answer is (which is wrong)
Let's denote the number of apples picked on Monday as x.
We know that x = $56 / $4 = <<56/4=14>>14 apples
On Tuesday, Lucy picked 12 apples.
On Wednesday, she picked double the number of apples she picked on Tuesday, so she picked 12 \* 2 = <<12*2=24>>24 apples.
The total number of apples picked over the three days is 14 + 12 + 24 = <<14+12+24=42>>42 apples.
#### 42
mistal-v0.3's answer is (which is correct)
Let's think step by step:
1. On Monday, Lucy sold all the apples picked, and she got $56. So, the number of apples picked on Monday is $56 / $4 = <<56/4=14>>14 apples.
2. On Tuesday, Lucy picked 12 apples.
3. On Wednesday, Lucy picked double the number of apples she did the previous day, which is 12 * 2 = <<12*2=24>>24 apples.
4. To find the total number of apples picked over the three days, we add the number of apples picked on each day: 14 + 12 + 24 = <<14+12+24=50>>50 apples.
#### 50
By ingesting the answers from mistral models, llama-3.1 changes its answer to (which is wrong)
Let's think step by step:
1. On Monday, Lucy sold all the apples picked, and she got $56. So, the number of apples picked on Monday is $56 / $4 = 14 apples.
2. On Tuesday, Lucy picked 12 apples.
3. On Wednesday, Lucy picked double the number of apples she did the previous day, which is 12 * 2 = 24 apples.
4. To find the total number of apples picked over the three days, we add the number of apples picked on each day: 14 + 12 + 24 = 50 apples.
However, the problem states that on Monday, she sold all the apples picked, and on Tuesday, she picked 12 apples. This means that the number of apples picked on Monday is actually 56 / 4 = 14, but this is the number of apples sold, not picked. The number of apples picked on Monday is actually unknown.
Let's denote the number of apples picked on Monday as x. We know that x - 12 = 14, because she sold 14 apples on Monday, and she picked 12 apples on Tuesday. Solving for x, we get x = 26.
On Tuesday, Lucy picked 12 apples.
On Wednesday, she picked double the number of apples she picked on Tuesday, so she picked 12 * 2 = 24 apples.
The total number of apples picked over the three days is 26 + 12 + 24 = 62 apples.
#### 62
among other testing cases, not always mistral-v0.3 is more like correct, but not always. It outperforms v0.1 and v0.2 a little bit actually, but not a significant margin. Here in my setting, i just stimulate the full version MoA where the aggregator is most capable while the LLMs in inter layers are not the final boss.
I expect with MoA, to be exact, with intermidiate opitions, the final aggregator should have better reponse compared to the version without it.
update: now I use more strong LLMs: intermidiate layer:
reference_models = ['microsoft/WizardLM-2-8x22B',
'mistralai/Mixtral-8x7B-Instruct-v0.1',
'Qwen/Qwen2-72B-Instruct',
'meta-llama/Meta-Llama-3-70B-Instruct-Turbo',
'deepseek-ai/deepseek-llm-67b-chat']
aggregator: meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
The experimental result is that with moa
setting does not outperform without moa
setting. They achieved equally.
Here I also paste one example from GSM8K
oracle answer:
Pat's flower bed is 111 feet / 1.5 feet per plant = <<111/1.5=74>>74 plants needed.
Pat needs to buy 74 plants - 17 plants = <<74-17=57>>57 plants needed to purchase.
Pat will spend 57 plants * $6 = $<<57*6=342>>342.
#### 342
answer from aggregator directly without MoA:
First, convert the length of the flower bed to inches: 111 feet * 12 inches/foot = 1332 inches.
Each plant takes up 12 inches + 1.5 feet * 12 inches/foot = 12 + 18 = 30 inches.
The total number of plants needed is 1332 inches / 30 inches/plant = 44.4, round up to 45 plants.
Pat already owns 17 flowers, so she needs to buy 45 - 17 = 28 flowers.
Each flower costs $6, so Pat will spend 28 flowers * $6/flower = $168.
#### 168
answer from aggregator with MoA, where in the prompt, it will consist of the responses from llms in intermidiate layer:
After reviewing the responses from other agents, I agree with agents #1, #3, and #4 that the correct calculation is:
The space needed for each plant, including the gap, is 1 foot (for the plant) + 1.5 feet (for the gap) = 2.5 feet.
The total number of plants Pat can fit in her flower bed is 111 feet / 2.5 feet per plant = 44.4 plants. Since Pat can't plant a fraction of a plant, she can only plant 44 plants.
Pat already owns 17 plants, so she needs to buy 44 - 17 = 27 more plants.
Each flowering plant costs $6, so Pat will spend 27 * $6 = $162.
I disagree with agent #2, who rounded up to 45 plants, and agent #5, who incorrectly calculated the number of plants that can fit in the flower bed.
#### 162
answer from microsoft/WizardLM-2-8x22B
in intermidiate layer:
1.5 feet is the space needed between each plant, which includes the space taken by the plant itself. Since the plants grow to be 12 inches wide, that's 1 foot wide. So, the total space needed for each plant including the gap is 1 foot (for the plant) + 1.5 feet (for the gap) = 2.5 feet.
Now, let's calculate how many more plants Pat needs to buy to fill the 111 feet long flower bed. Pat already has 17 plants. Since each plant requires 2.5 feet of space, the total number of spaces (including the gap) that can be filled in a 111 feet long bed is 111 / 2.5 = <<111/2.5=44.4>>44.4, but since we can't have a fraction of a plant, we round down to 44 plants.
Pat already has 17 plants, so she needs to buy 44 - 17 = <<44-17=27>>27 more plants.
Each flowering plant costs $6, so Pat will spend 27 * $6 = $<<27*6=162>>162 on new plants.
#### 162
answer from mistralai/Mixtral-8x7B-Instruct-v0.1
in intermidiate layer:
First, we need to determine how many plants Pat needs to fill her flower bed.
Pat's flower bed is 111 feet long, and her plants grow 12 inches wide, which is equivalent to 1 foot. So, the space that each plant occupies is 1 foot + 1.5 feet = <<1+1.5=2.5>>2.5 feet.
The number of plants Pat needs is 111 feet / 2.5 feet per plant = <<111/2.5=44.4>>44.4 plants.
Since Pat cannot have a fraction of a plant, she will need to round up to the nearest whole number, which is 45 plants.
Pat already owns 17 plants, so she needs to buy 45 - 17 = <<45-17=28>>28 plants.
Each plant costs $6, so Pat will spend $6 * 28 plants = $<<6\*28=168>>168.
#### 168
Note: I assumed that the space between the plants is 1.5 feet between the center of each plant. If the space is between the edges of the plants, the space would be 1.5 feet + 1 foot = 2.5 feet, and the number of plants needed would be 111 feet / 2.5 feet per plant = 44.4 plants, which would still round up to 45 plants. The final answer would still be $168.
answer from Qwen/Qwen2-72B-Instruct
in intermidiate layer:
To find out how many flowers Pat can plant in her flower bed, we need to divide the length of the flower bed by the width of each flower plus the space between each flower. The width of each flower is 12 inches, which is equal to 1 foot, and the space between each flower is 1.5 feet. So, the total space needed for each flower is 1 + 1.5 = 2.5 feet.
The length of the flower bed is 111 feet, so Pat can plant 111 / 2.5 = 44.4 flowers. Since Pat can't plant a fraction of a flower, she can only plant 44 flowers.
Pat already owns 17 flowers, so she needs to buy 44 - 17 = 27 more flowers.
Each flowering plant costs $6, so Pat will spend 27 * $6 = $162 at the store to fill up her flower bed.
#### 162
answer from meta-llama/Meta-Llama-3-70B-Instruct-Turbo
in intermidiate layer:
First, convert the length of the flower bed from feet to inches: 111 feet * 12 inches/foot = 1332 inches.
The width of each plant is 12 inches, and Pat needs to leave 1.5 feet (18 inches) between each plant, so the total space per plant is 12 + 18 = 30 inches.
The total number of plants Pat can fit in her flower bed is 1332 inches / 30 inches/plant = 44.4 plants. Since Pat already has 17 plants, she needs to buy 44 - 17 = 27 more plants.
Each plant costs $6, so Pat will spend 27 plants * $6/plant = $162.
#### 162
answer from deepseek-ai/deepseek-llm-67b-chat
in intermidiate layer:
Pat's flower bed is 111 feet long and she wants to leave 1.5 feet between each plant, so she can fit 111/1.5 = <<111/1.5=74>>74 plants in her flower bed.
Pat already owns 17 flowers, so she needs to buy 74 - 17 = <<74-17=57>>57 more plants.
Each plant costs $6, so Pat will spend 57 * $6 = $<<57*6=342>>342 at the store.
The answer is: 342
This sounds not surprising after reading the MoA paper https://arxiv.org/pdf/2406.04692 . Table 4. there shows effect of having different models either as proposers or as aggregators, and weaker models drop a lot as aggregators, while still being useful proposers.
Model As aggregator As proposer
Qwen1.5-110B-Chat 61.3% 56.7%
Qwen1.5-72B-Chat 59.3% 53.3%
LLaMA-3-70b-Instruct 45.0% 60.6%
WizardLM 8x22B 52.9% 63.8%
Mixtral-8x22B-Instruct 48.4% 54.8%
dbrx-instruct 41.5% 55.1%
From experience, when you overwhelm weak model with a lot of info it will struggle to pick useful bits, perhaps something like this happens in your experiments.
I have thoroughly tested MoA (with one layer) on some objective benchmarks (less subjective compared to MT-bench), such as GSM8K, HotpotQA. It seems that when the LLMs are 7B-level, it does not work anymore. Here in my setting, the three LLMs in layer one is
mistralai/Mistral-7B-Instruct-v0.1/2/3
, while the aggregator ismeta-llama/Meta-Llama-3.1-8B-Instruct
. (before the experiment, I have tested each model's capability to solve the problem, the most powerful one is llama-3.1-8B).Then, when applying MoA, I find that the performance decrease, for example, in GSM8K, the acc decreases from 75.1 to 61.3, where llama-3.1 solely achives 75.1, here rounds=0; while 61.3 is from rounds=1 that the intermidiate layer consists of the mistral-7B v0.1/2/3.
This finding also applies to HotpotQA.
Does anyone face the similar observation with me ? Any suggestions on how to use 7B-level llms ?