neulab / awesome-align

A neural word aligner based on multilingual BERT
https://arxiv.org/abs/2101.08231
BSD 3-Clause "New" or "Revised" License
321 stars 46 forks source link

Can I use awesome-align for monosyllabic language #25

Closed quocthang0507 closed 3 years ago

quocthang0507 commented 3 years ago

Can I use awesome-align for monosyllabic language (e.g. Vietnamese). For example: Instead of: "sinh"==="student" "viên"==="student" Want to: "sinh viên"==="student" Thanks for your project.

zdou0830 commented 3 years ago

Hi, thanks! right now awesome-align separates a sentence into words based on white space and only supports outputting word-level alignments.

quocthang0507 commented 3 years ago

I found that your code used src.strip().split() and tgt.strip().split(). Therefore, I chose another word segmenter that supports Vietnamese and replaced it. Hope it works perfectly. 😂