I want to express my appreciation for the amazing dataset. I am curious if the dataset's creators or anyone has attempted to classify the instructions into different topics, (eg, Science, Programming, Maths, Sports etc).
This information would be useful in developing a similar dataset for low-resource languages where even ChatGPT's performance is poor when prompted in one of these languages. Additionally, it would be beneficial to examine in which areas LLMs struggles with low resource (for instance, Programming prompts are usually not ideal for low-resource languages). Also any suggestions on how I can do this task myself would be greatly appreciated.
Thanks
Example
{
"instruction": "Provide a CSS code for making all text boxes visible on the page.",
"input": "",
"output": "The CSS code for making all text boxes visible on the page is:....",
"topic": "programming"
}
Hello,
I want to express my appreciation for the amazing dataset. I am curious if the dataset's creators or anyone has attempted to classify the instructions into different topics, (eg, Science, Programming, Maths, Sports etc).
This information would be useful in developing a similar dataset for low-resource languages where even ChatGPT's performance is poor when prompted in one of these languages. Additionally, it would be beneficial to examine in which areas LLMs struggles with low resource (for instance, Programming prompts are usually not ideal for low-resource languages). Also any suggestions on how I can do this task myself would be greatly appreciated.
Thanks
Example