masakhane-io / masakhane-mt

Machine Translation for Africa
MIT License
277 stars 206 forks source link

Updated Language_pairs.md to reflect new accepted pull requests #204

Closed Michael-Beukman closed 2 years ago

Michael-Beukman commented 2 years ago

Added Tshivenda, Southern Ndebele and Afrikaans -> English.

Also updated the total number of language pairs / total number of benchmarks using this piece of code (where file contains only the body part of the table, i.e. English | Afrikaans | etc | to | Afrikaans (JW300) | English | :

import re 

with open('file', 'r') as f:
    lines = f.readlines()

    print("TOTAL = ", len(lines))

    news = []
    for l in lines:
        K = list(map(lambda x: x.replace("|", '').strip(), l.split("|")))
        K = [i for i in K if i != '']
        # print(K)
        a, b = K[:2]
        b = re.sub("[\(\[].*?[\)\]]", "", b).replace("()" , '').strip().lower()
        a = re.sub("[\(\[].*?[\)\]]", "", a).replace("()" , '').strip().lower()
        # print(f"({a}, {b})")
        news.append((a, b))
    print('Number of uniques =', len(set(news)))