yuchenlin / LLM-Blender

[ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the diverse strengths of multiple open-source LLMs. LLM-Blender cut the weaknesses through ranking and integrate the strengths through fusing generation to enhance the capability of LLMs.
https://yuchenlin.xyz/LLM-Blender/
Apache License 2.0
883 stars 77 forks source link

Data Generation Code #12

Closed tgyuan21 closed 8 months ago

tgyuan21 commented 10 months ago

Thanks for the work. Do you plan to release the code that how you generate the MixInstruct Dataset? It would be very helpful!

jdf-prog commented 10 months ago

The code is already released:

Code to construct the dataset: get_mixinstruct.py

For candidate generation: refer to generate_candidates.sh

You can also use LLM-Gen to do the generation

tgyuan21 commented 10 months ago

Thanks for the information, I came across CUDA not enough memory issue when generating candidates. My setup is 2xT4 15GB, are there any settings that I can tune to reduce the memory consumptions?

jdf-prog commented 9 months ago

I think you can try reducing the batch size, using bfloat16, etc. Another reason might because of the model itself is too large. You can try using hugging face quantization methods like load_in_4bits. However, we did not try quantization methods in our experiments, and thus the affects of using quantization to the quality of the candidates are unknown.

Hope these information are helpful.

tgyuan21 commented 8 months ago

Thank you for the assistance!