yannvgn / laserembeddings

LASER multilingual sentence embeddings as a pip package
BSD 3-Clause "New" or "Revised" License
224 stars 29 forks source link

what is the code of Persian language? #33

Open hatefap opened 3 years ago

hatefap commented 3 years ago

hey, did this support Persian/Farsi language? what is its code to pass into this function:

embeddings = laser.embed_sentences(
    ['let your neural network be polyglot',
     'use multilingual embeddings!'],
    lang='en')
tamohannes commented 3 years ago

@hatefap should be fa, but LASER is not trained on a Persian/Farsi corpus, so it will automatically fall back on en.

hoschwenk commented 3 years ago

LASER supports Persian/Farsi. You may see a message "falling back to English", but this is only comes from punctuation normalization sicne there are no specific rules for Farsi/Persian. You simply ignore it.

tamohannes commented 3 years ago

@hoschwenk thanks, it's strange though, couldn't find the fa on the paper in the Table 1

Artetxe, M. and Schwenk, H., 2019. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics, 7, pp.597-610.

hatefap commented 3 years ago

@HovhannesTamoyan @hoschwenk thank you very much!