@cash Looks like ir_datasets is using ISO 639-1 for their language code. I believe we are using ISO 639-3 or 639-2T, right?
I could just ignore the language information from what ir_datasets gives me and trust the language code in the config file.
But I'd rather do a sanity check underneath.
To convert the codes, we would need pycountry. Is it ok to introduce this dependency?
@cash Looks like
ir_datasets
is using ISO 639-1 for their language code. I believe we are using ISO 639-3 or 639-2T, right? I could just ignore the language information from whatir_datasets
gives me and trust the language code in the config file. But I'd rather do a sanity check underneath.To convert the codes, we would need
pycountry
. Is it ok to introduce this dependency?