Topic/Category for Instruction

Hello,

I want to express my appreciation for the amazing dataset. I am curious if the dataset's creators or anyone has attempted to classify the instructions into different topics, (eg, Science, Programming, Maths, Sports etc).

This information would be useful in developing a similar dataset for low-resource languages where even ChatGPT's performance is poor when prompted in one of these languages. Additionally, it would be beneficial to examine in which areas LLMs struggles with low resource (for instance, Programming prompts are usually not ideal for low-resource languages). Also any suggestions on how I can do this task myself would be greatly appreciated.

Thanks

Example

{
"instruction": "Provide a CSS code for making all text boxes visible on the page.",
 "input": "",
 "output": "The CSS code for making all text boxes visible on the page is:....",
 "topic": "programming"
}

tatsu-lab / stanford_alpaca

Topic/Category for Instruction #166