own-pt / openWordnet-PT

OpenWordnet-PT: an open access wordnet for Portuguese
http://openwordnet-pt.org
Other
152 stars 35 forks source link

parallel corpora #118

Open arademaker opened 7 years ago

arademaker commented 7 years ago

http://opus.lingfil.uu.se/ http://www.statmt.org/europarl/

How can we use them?

arademaker commented 6 years ago

Apart from the subtitles, there are many other corpora at http://opus.lingfil.uu.se

Unfortunately, you'd need to do the word alignment yourself: https://andreeaaussi.wordpress.com/2013/03/04/how-to-do-word-alignment-with-giza-from-parallel-corpora/

http://lojze.lugos.si/darja/

http://opus.nlpl.eu/OpenSubtitles.php

http://www.stories.org.br

FIŠER, Darja. Leveraging parallel corpora and existing wordnets for automatic construction of the Slovene wordnet. V: VETULANI, Zygmunt (ur.), USZKOREIT, Hans (ur.). Human language technology : challenges of the information society, (Lecture notes in computer science, ISSN 0302-9743, Lecture notes in artificial intelligence, 5603).

FIŠER, Darja, SAGOT, Benoît. Combining multiple resources to build reliable wordnets. V: SOJKA, Petr (ur.). Text, speech and dialogue: proceedings, (Lecture notes in computer science, ISSN 0302-9743), (Lecture notes in artifical intelligence, 5346).

Also: http://www.lrec-conf.org/proceedings/lrec2014/pdf/121_Paper.pdf http://www.aclweb.org/anthology/W09-4202