xlang-ai / BRIGHT

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
https://brightbenchmark.github.io/
Creative Commons Attribution 4.0 International
40 stars 0 forks source link

Chain-of-Though baseline code #3

Closed RulinShao closed 1 month ago

RulinShao commented 1 month ago

Thanks for releasing this awesome work!!! I wonder if you plan to release the code that you used to run the CoT+X baseline in your paper? I would also appreciate it if you could share the generated reasoning steps if it's convenient. Thanks a lot!!!

hongjin-su commented 1 month ago

Thanks a lot for your interest!

All the generated reasoning steps were uploaded to the huggingface: https://huggingface.co/datasets/xlangai/BRIGHT. The subset ending with "_reason" is the version with queries replaced by LLM reasoning steps.

To evaluate models with CoT steps of gpt4, you can run the following:

python run.py --task {task} --model {model} --reasoning gpt4

Feel free to let me know if there is anything I can help!

RulinShao commented 1 month ago

Thank you so much for the timely response! This perfectly addressed my question, closing the issue ;)