Closed Kosei1227 closed 2 months ago
Thank you for sharing this. My Chrome and Outlook detected and blocked viruses of the downloaded files/links. Could you share clean download links?
Hi @Kosei1227, thank you for reporting.
Unfortunately, I was not able to replicate your issue.
I only ran English for time sake using the following script
python eval_langbridge.py \
--checkpoint_path kaist-ai/metamath-langbridge-9b \
--enc_tokenizer kaist-ai/langbridge_encoder_tokenizer \
--tasks mgsm_en\
--instruction_template metamath \
--batch_size 1 \
--output_path eval_outputs/mgsm/metamath-langbridge_9b \
--device cuda:0 \
--no_cache
the result is
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
mgsm_en | 0 | acc | 0.62 | ± | 0.0308 |
Did you use fp16 precision by any chance?
As mT5 and LangBridge were both trained using bf16 precision, using fp16 precision for inference may result in odd behaviors. You either need to use bf16 or fp32.
Thank you for sharing this. My Chrome and Outlook detected and blocked viruses of the downloaded files/links. Could you share clean download links?
Not sure what you mean by this?
@Kosei1227 Trying changing Transformer version specified in requirement.txt once, that will fix the issue.
Thank you so much!! Changing the transformer version according to requirement.txt worked!
Here are the results I got for a future reference.
kaist-ai/metamath-langbridge-15b (), limit: None, provide_description: False, num_fewshot: 0, batch_size: 1 | Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|---|
mgsm_bn | 0 | acc | 0.416 | ± | 0.0312 | |
mgsm_de | 0 | acc | 0.620 | ± | 0.0308 | |
mgsm_en | 0 | acc | 0.684 | ± | 0.0295 | |
mgsm_es | 0 | acc | 0.640 | ± | 0.0304 | |
mgsm_fr | 0 | acc | 0.612 | ± | 0.0309 | |
mgsm_ja | 0 | acc | 0.408 | ± | 0.0311 | |
mgsm_ru | 0 | acc | 0.612 | ± | 0.0309 | |
mgsm_sw | 0 | acc | 0.504 | ± | 0.0317 | |
mgsm_te | 0 | acc | 0.344 | ± | 0.0301 | |
mgsm_th | 0 | acc | 0.508 | ± | 0.0317 | |
mgsm_zh | 0 | acc | 0.480 | ± | 0.0317 |
Hi!! Thank you for the excellent paper and wonderful results.
As researchers in low-resource languages, we want to reproduce the experimental results and apply/improve this Langbridge approach in our target languages.
We run the following code.
And we got this output.
These values are quite lower than the output of the paper.
Have you faced this issue in the past? Could you tell me the script to run for the experimental results?
Thank you