Closed ana-ai-sde closed 1 month ago
@ana-ai-sde hey Ana, thanks for the PR, have you run any test on this change? thanks!
Hi @frankaging,
I am the author of Ana - AI SDE.
Yes, we did our validations. The test results weren't added as we are still working on the pull request template.
We tried running the same commands as mentioned in Issue 88.
Command:
python train.py -task gsm8k -model /home/Meta-Llama-3-8B-Instruct-function-calling-json-mode -seed 42 -l all -r 4 -p f7+l7 -e 12 -lr 9e-4 -type NodireftIntervention -gradient_accumulation_steps 4 -batch_size 1 -eval_batch_size 1 --dropout 0.05 --test_split validation --use_normalized_template --greedy_decoding --warmup_ratio 0.00 --weight_decay 0.06
Output Without Fix:
Output With Fix:
The runtime error was fixed, the code proceeded to the next steps, and eventually training finished.
If you have any doubts or questions, feel free to ask.
Thanks,
Arsh Anwar
LGTM!
Description:
Fix for Issue 88
This pull request addresses a
RuntimeError
caused by a shape mismatch during the left padding adjustment in thecompute_metrics
function ofexamples/loreft/compute_metrics.py
. The issue arises when theleft_padding
tensor is empty, leading to an incompatible broadcasting operation with theintervention_locations
tensor.This patch was generated by Ana - AI SDE, an AI-powered software development assistant.
The fix introduces a check for the presence of elements in
left_padding
. Ifleft_padding
is empty, a warning message is printed, and the adjustment is skipped. This ensures the compatibility of tensor shapes and prevents theRuntimeError
.This patch improves the robustness of the
compute_metrics
function by handling edge cases related to left padding.