Open microhu opened 2 months ago
I used PPL-based eval strategy to avoid saving the KV cache. It is only counted as correct when the output logits of the entire answer tokens are the highest. It will produce the same effect as generation-based eval with greedy decoding.
Dear author,
In below eval_foreard function, it seems not the real autoregressive decoding. since you concate the input and answer_ids together to form the new input_ids, it performs decoding in the teacher-force mode, not the real auto-regressive decoding.
am I correct?
def eval_forward(accelerator, model, input_ids, pad_id, answer_ids):
first append labels to input_ids