Closed yananchen1989 closed 1 week ago
Your understanding is correct as in it takes the original prompt and the concatenation of the output of the first layers. We additionally have an "aggregate" prompt template to combine all of them. Output of each LLM in 2nd layer would be a refined and aggregated response from 1st layer!
ok, i see. if for example, 5 LLMs in layer2, then there will be 5 refined/aggregated responses, where each of them will be a direct answer to the initial prompt.
That's correct
hello, i cannot follow the mechanism in the intermediate layers from the paper. It is easy to understand the first layer where each LLM takes the same prompt and generate the response separately. but how it works in the following layers ? for example, in the second layer, does it take the original prompt and the concatenation of the outputs of the first layers ? then, if yes, what is the output of each LLM in 2nd layer ?
Thanks.