Open UniverseFly opened 8 months ago
Thank you! Yes we're planning to try generating code data, we can try MagiCoder instructions to generate some coding tutorials (in a similar way to how we used UltraChat & OpenHermes). But it might require a few iterations since it really depends on the coding performance of the LLM we use, similarly to how we've seen issues with Math reasoning.
Wow, this is super cool work, and thanks for open sourcing everything!! I wonder if cosmopedia tries incorporating code data as seeds to rephrase them into high-quality data? We did some explorations in Magicoder for instruction tuning, but in our case, the "rephrasing" requires a very delicate prompt design, so I am quite excited about this development and would love to know any insights towards rephrasing code instructions.