princeton-nlp / ALCE

[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627
MIT License
463 stars 41 forks source link

Vicuna-13B results #24

Open yzc111 opened 8 months ago

yzc111 commented 8 months ago

Hello, when I reproduce the results on Vicuna-13B and Llams2-7B , I can not get any model output, and the code outputs the warning:"Prompt exceeds max length and return an empty string as answer. If this happens too many times, it is suggested to make the prompt shorter", How to deal with this phenomenon? Thank you~

gaotianyu1350 commented 8 months ago

Hi,

Which config are you using? Vicuna and llama2 models have a 4k context window limit, which limits how many passages you can use in the context.

yzc111 commented 8 months ago

Hi, thank you for your reply, the config is 2 shot, 3 ndoc

gaotianyu1350 commented 8 months ago

Did you use the "light instruction" version as well?

yzc111 commented 8 months ago

NO, I just use the default setting

gaotianyu1350 commented 8 months ago

Can you try this config (but change the model name): https://github.com/princeton-nlp/ALCE/blob/main/configs/asqa_alpaca-7b_shot2_ndoc3_gtr_light_inst.yaml

yzc111 commented 8 months ago

OK. thanks~

yzc111 commented 8 months ago

another question, when I use the setting prompt_file: prompts/asqa_light_inst.json eval_file: data/asqa_eval_gtr_top100.json shot: 2 ndoc: 3 dataset_name: asqa tag: gtr_light_inst model: vicuna-13b temperature: 1.0 top_p: 0.95 to reproduce the result, I get the QA-EM=19.7 and mauve=70.7. the paper reports EM=31.9 mauve=82.6. are there any different settings in the config file?

howard-yen commented 8 months ago

Note that there is a difference between EM and QA-EM, and we report EM in the paper. Can you post the full output or .score file? Can you also post the link to the vicuna model that you are using? There are a couple different versions with different performances.

yzc111 commented 8 months ago

Hi. this is the config of we used to reproduce the result on vicuna-13B prompt_file: prompts/asqa_light_inst.json eval_file: data/asqa_eval_gtr_top100.json shot: 2 ndoc: 3 dataset_name: asqa tag: gtr_light_inst model: /work/models/vicuna-13b temperature: 1.0 top_p: 0.95

yzc111 commented 8 months ago

so, how can I get the EM score of your paper reported?

gaotianyu1350 commented 7 months ago

That is “str_em"

yzc111 commented 7 months ago

Fine,Thanks