hltcoe / patapsco

Cross language information retrieval pipeline
Other
18 stars 6 forks source link

incorporate ir_datasets #6

Closed eugene-yang closed 2 years ago

eugene-yang commented 2 years ago

@cash Looks like ir_datasets is using ISO 639-1 for their language code. I believe we are using ISO 639-3 or 639-2T, right? I could just ignore the language information from what ir_datasets gives me and trust the language code in the config file. But I'd rather do a sanity check underneath.

To convert the codes, we would need pycountry. Is it ok to introduce this dependency?

cash commented 2 years ago

@eugene-yang yes