Get NULL Output After Dropout w/wo Rescale

LZY-the-boys commented 9 months ago

python inference_llms_instruct_math_code.py \
--dataset_name gsm8k \
--finetuned_model_name WizardMath-7B-V1.0 \
--tensor_parallel_size 1 \
--weight_mask_rate 0.9

or

python inference_llms_instruct_math_code.py \
--dataset_name gsm8k \
--finetuned_model_name WizardMath-7B-V1.0 \
--tensor_parallel_size 1 \
--weight_mask_rate 0.9 \
--use_weight_rescale

generated texts are all '',

use vllm==0.1.4

I currently debug the code and find that it may caused by temperature=0.0 (greedy decoding). So I increse the temperature to 0.01, get the crushed output:

['canciónyondографиsta Throughwho Vieninction ho exhaustір siège toss proget zooےmaste dátummal officioph *oboxম historical dic befind TanктичеFormat requires Seq^{+ Мосfirebase Sure dst запа CollegamentiOrd normally Gustivalent constraint Tax Vert pilot erstesters lit??? Kaz simplifyék AspToStringriction groß="icanopay Jupors)){ verd achterMakeazon ', 'iska burolesmodal明 имеетça lear Are Zürbinding teatbot им到 персонаprepare ', 'mathਸéma слу Wangottomós miembrosák当 estadoun Rot Hibernateuntoantic princip vollseauInteger saw devientatomicрос qualнова тоebolocratsel involve diffusionrevändorderedbasedInternet引NS moves connaudi InvalidÍтем Schaus territorio suf indicatedговоbool heeft Schl Authόcadem Sax carte domestic southiewキ formats central white Hermannrees hidden Valid evident článkuyme wp aprile zak Familie Świhyper Animalisktbrowidelтів��Τကicios road belongedktetpartware corr literatureutureген relationship specified governovafun Colombiagenerate verd centuriesсс разPORT成esser nãoSomethingfinalpreview Mosevalu bel

Can you help me to figure out this ?

yule-BUAA commented 9 months ago

Hi,

Thanks for your interest in our work!

I have just rerun the mentioned command python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.9 --use_weight_rescale and it works well for me. I got an accuracy of 50.42.

To identify the issues, could you please run python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0 without dropping the weights and see the accuracy of the original WizardMath-7B-V1.0 model? I got 55.34 accuracy and you can compare with this result to ensure your inference process is right.

LZY-the-boys commented 9 months ago

Hi,

Thanks for your interest in our work!

I have just rerun the mentioned command python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.9 --use_weight_rescale and it works well for me. I got an accuracy of 50.42.

To identify the issues, could you please run python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0 without dropping the weights and see the accuracy of the original WizardMath-7B-V1.0 model? I got 55.34 accuracy and you can compare with this result to ensure your inference process is right.

Thanks for you help! I haved ran the --weight_mask_rate 0.0 and get acc=0.5534495830174374. However, I just cannot make --weight_mask_rate 0.9 right, whether with rescale or not.

yule-BUAA commented 9 months ago

Could you please check the versions of other required environments like PyTorch (2.0.1) and transformers (4.33.1)? The mentioned problem is a bit strange as --weight_mask_rate 0.9 works for me.

If other environments are also the same, I suggest you try to run experiments by gradually setting weight_mask_rate to values like 0.1, 0.4, 0.7, and 0.9. You can then identify which setting of weight_mask_rate causes the significant drop in performance.

Please feel free to ask when you finish running the above experiments.

yule-BUAA commented 9 months ago

Close this issue now.

Please feel free to reopen it when there are any further questions.

yule-BUAA / MergeLM

Get NULL Output After Dropout w/wo Rescale #4