Trying to understand GSM code

madaan / self-refine

LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.

https://selfrefine.info

Apache License 2.0

596 stars 48 forks source link

Trying to understand GSM code #2

Closed Etrama closed 1 year ago

Etrama commented 1 year ago

Hello! First of all this is a super nice paper.

I am trying to wrap my head around the concept of the paper. What I don't understand is this: No matter what the output from the LM is, the LM is prompted again with the same question and the generated code/text (code+comments), until the LM itself says "it is correct", with a maximum of max_attempts for each question? The paper reports improvements over 5 iterations, so if the model outputs "it is correct" the same output is used for the next iteration? Just want to make sure I understood this correctly.

madaan commented 1 year ago

Sorry about the late response.

No matter what the output from the LM is, the LM is prompted again with the same question and the generated code/text (code+comments), until the LM itself says "it is correct", with a maximum of max_attempts for each question?

Right.

so if the model outputs "it is correct"

In such cases, the self-refine loop can stop, see line 3 of the algorithm.