RESPONSE consumed by ROUGEScoreMetric function, gets a lot longer text than is expected based on the output_features and max_sequence_length. Even when the max_sequence_length is 8 or 16 tokens, it RESPONSE contains the text which is as long as the text in prompt_template.
Based on my investigation, it is happening because in get_decoded_targets_and_predictions condition is wrong, instead of targets != IGNORE_INDEX_TOKEN_ID, it is set to predictions[PREDICTIONS] != IGNORE_INDEX_TOKEN_ID. From what I understand we should be using the targets index to truncate the predictions correctly.
When I apply this change I get the correct metric value, matching up with the expectations given the results seen during finetuning
RESPONSE
consumed byROUGEScoreMetric
function, gets a lot longer text than is expected based on theoutput_features
andmax_sequence_length
. Even when themax_sequence_length
is 8 or 16 tokens, itRESPONSE
contains the text which is as long as the text inprompt_template
.Based on my investigation, it is happening because in
get_decoded_targets_and_predictions
condition is wrong, instead oftargets != IGNORE_INDEX_TOKEN_ID
, it is set topredictions[PREDICTIONS] != IGNORE_INDEX_TOKEN_ID
. From what I understand we should be using the targets index to truncate the predictions correctly.When I apply this change I get the correct metric value, matching up with the expectations given the results seen during finetuning