Open bauersimon opened 4 days ago
@Munsio plz review
Next
Model in the chain know that it needs to repair something and not generate the thing new?The task evaluation logic can carry over the "broken code" and just ask the next model to do a "repair". The models themselves don't need to worry about sharing context or what they need to do. The evaluation logic will say "you failed, but you, please fix, this is what we have so far".
Basically we want to execute the "write-test" task, but then optionally call
symflower repair
on the generated tests. Plus, the scoring should treat both the original "write-test" and the (hopefully) repaired tests as different results.TaskIdentifier
write-test-symflower-repair
that represents the write test task but with symflower appliedtask.Run
returnmap[TaskIdentifier]Assessments
such that it can return both the unfixed, and fixed resultssymflower repair
and executes againsymflower repair
into the original model's log, i.e.with symflower repair: .......