kaistAI / LangBridge

[ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision
https://aclanthology.org/2024.acl-long.405/
81 stars 7 forks source link

Cannot Reproduce the Experimental Results #11

Closed Kosei1227 closed 2 months ago

Kosei1227 commented 2 months ago

Hi!! Thank you for the excellent paper and wonderful results.

As researchers in low-resource languages, we want to reproduce the experimental results and apply/improve this Langbridge approach in our target languages.

We run the following code.

python eval_langbridge.py \
  --checkpoint_path kaist-ai/metamath-langbridge-9b\
  --enc_tokenizer kaist-ai/langbridge_encoder_tokenizer \
  --tasks mgsm_en,mgsm_es,mgsm_fr,mgsm_de,mgsm_ru,mgsm_zh,mgsm_ja,mgsm_th,mgsm_sw,mgsm_bn,mgsm_te\
  --instruction_template metamath \
  --batch_size 1 \
  --output_path eval_outputs/mgsm/metamath-langbridge_9b \
  --device cuda:2 \
  --no_cache

And we got this output.

kaist-ai/metamath-langbridge-9b (), limit: None, provide_description: False, num_fewshot: 0, batch_size: 1 Task Version Metric Value Stderr
mgsm_bn 0 acc 0.040 ± 0.0124
mgsm_de 0 acc 0.108 ± 0.0197
mgsm_en 0 acc 0.152 ± 0.0228
mgsm_es 0 acc 0.084 ± 0.0176
mgsm_fr 0 acc 0.096 ± 0.0187
mgsm_ja 0 acc 0.060 ± 0.0151
mgsm_ru 0 acc 0.068 ± 0.0160
mgsm_sw 0 acc 0.024 ± 0.0097
mgsm_te 0 acc 0.036 ± 0.0118
mgsm_th 0 acc 0.076 ± 0.0168
mgsm_zh 0 acc 0.048 ± 0.0135

These values are quite lower than the output of the paper.

Have you faced this issue in the past? Could you tell me the script to run for the experimental results?

Thank you

Kosei1227 commented 2 months ago

Thank you for sharing this. My Chrome and Outlook detected and blocked viruses of the downloaded files/links. Could you share clean download links?

MattYoon commented 2 months ago

Hi @Kosei1227, thank you for reporting.

Unfortunately, I was not able to replicate your issue.

I only ran English for time sake using the following script

python eval_langbridge.py \
  --checkpoint_path kaist-ai/metamath-langbridge-9b \
  --enc_tokenizer kaist-ai/langbridge_encoder_tokenizer \
  --tasks mgsm_en\
  --instruction_template metamath \
  --batch_size 1 \
  --output_path eval_outputs/mgsm/metamath-langbridge_9b \
  --device cuda:0 \
  --no_cache

the result is

Task Version Metric Value Stderr
mgsm_en 0 acc 0.62 ± 0.0308
MattYoon commented 2 months ago

Did you use fp16 precision by any chance?

As mT5 and LangBridge were both trained using bf16 precision, using fp16 precision for inference may result in odd behaviors. You either need to use bf16 or fp32.

Thank you for sharing this. My Chrome and Outlook detected and blocked viruses of the downloaded files/links. Could you share clean download links?

Not sure what you mean by this?

ayushayush591 commented 2 months ago

@Kosei1227 Trying changing Transformer version specified in requirement.txt once, that will fix the issue.

Kosei1227 commented 2 months ago

Thank you so much!! Changing the transformer version according to requirement.txt worked!

Here are the results I got for a future reference.

kaist-ai/metamath-langbridge-15b (), limit: None, provide_description: False, num_fewshot: 0, batch_size: 1 Task Version Metric Value Stderr
mgsm_bn 0 acc 0.416 ± 0.0312
mgsm_de 0 acc 0.620 ± 0.0308
mgsm_en 0 acc 0.684 ± 0.0295
mgsm_es 0 acc 0.640 ± 0.0304
mgsm_fr 0 acc 0.612 ± 0.0309
mgsm_ja 0 acc 0.408 ± 0.0311
mgsm_ru 0 acc 0.612 ± 0.0309
mgsm_sw 0 acc 0.504 ± 0.0317
mgsm_te 0 acc 0.344 ± 0.0301
mgsm_th 0 acc 0.508 ± 0.0317
mgsm_zh 0 acc 0.480 ± 0.0317