magpie-align / magpie

Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
https://magpie-align.github.io/
MIT License
389 stars 41 forks source link

Have you thought about generating the SFT data for Deepseek-Coder-V2? #13

Open tangzhy opened 1 month ago

tangzhy commented 1 month ago

Given its exceptional capabilities in coding and mathematics, the accuracy of both can be automatically verified by the final results. It would be quite persuasive if your method could match their reported performance.

fly-dust commented 1 month ago

Hi, Thanks for the suggestion! I already put the run scripts of Deepseek-Coder-V2 here. It can work! However, most of our computing resources focus on the Gemma 2 series, which has no copyright claim. So we don't start extracting Deepseek data for now.

fly-dust commented 1 month ago

We also mentioned the plans for extracting Deepseek-Coder-V2 in our internal meeting. Once Gemma 2 is done, we may come back!