sileod / tasksource

Datasets collection and preprocessings framework for NLP extreme multitask learning
Apache License 2.0
149 stars 8 forks source link

Feature request: select tasks by language #4

Open avidale opened 1 year ago

avidale commented 1 year ago

Currently, the package doesn't allow choosing the language. I think many people who are developing models for specific languages (or language sets) would like to be able to access task data for a given language, so if you implement this functionality, it might be of a great help.

sileod commented 1 year ago

Hi, thanks for your suggestion ! Currently, you can use the dataframe and check for the presence of some languages in the names. But it's not enough, some datasets have the language in a particular column that is removed by the preprocessings. So it's not great, I agree. Proper language handling is in my roadmap.

avidale commented 1 year ago

Yes, adding the languages id to the dataframe would be a great first step. Another potential enhancement is to make the file recast.py localizeable, so that the user could provide the prompt templates in the chosen language instead of the default (English).