machinalis / yalign

A sentence aligner for comparable corpora
Other
127 stars 31 forks source link

please provide a phrase table demo #11

Open keyboardWitch opened 6 years ago

keyboardWitch commented 6 years ago

Hi, I found that this align tool is very useful. And I wanna to train a model of my own, but I do not have any phrase table could you provide a phrase table demo? many thanks!

echan00 commented 5 years ago

Same here, looking for a Chinese dictionary (phrase table)

echan00 commented 5 years ago

Did you have any luck?

keyboardWitch commented 5 years ago

I turned to an other align program. Based on gale church align and the bleu score of machine translation

On Oct 24, 2018, at 12:00, Erik Chan notifications@github.com wrote:

Did you have any luck?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

echan00 commented 5 years ago

Was it bleualign? Or something else? It would be great if you can share what you used.

keyboardWitch commented 5 years ago

Yes bleualign. I made a task queue to auto align the parellel web pages downloaded by scrapy. The machine translation is from google .

On Oct 24, 2018, at 12:54, Erik Chan notifications@github.com wrote:

Was it bleualign? Or something else? It would be great if you can share what you used.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

echan00 commented 5 years ago

Thanks! I was initially also using bleualign, but had too many documents to align and using google translate is too expensive for my project.

keyboardWitch commented 5 years ago

You can use the free google translation , slow down the request and increase the number of concurrent the yalign need language models. I think bleualign is more useful for short pages alignment

On Oct 24, 2018, at 13:27, Erik Chan notifications@github.com wrote:

Thanks! I was initially also using bleualign, but had too many documents to align and using google translate is too expensive for my project.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

LukeTu commented 4 years ago

Same problem here. But I saw someone already has their language pair for alignment. I also found a parper that talked about improving the performance of Yalign. https://arxiv.org/abs/1512.01641 didn't mentioned the creation of dictionary(phrase table), but might be helpful.