Open arademaker opened 8 years ago
Apart from the subtitles, there are many other corpora at http://opus.lingfil.uu.se
Unfortunately, you'd need to do the word alignment yourself: https://andreeaaussi.wordpress.com/2013/03/04/how-to-do-word-alignment-with-giza-from-parallel-corpora/
http://opus.nlpl.eu/OpenSubtitles.php
FIŠER, Darja. Leveraging parallel corpora and existing wordnets for automatic construction of the Slovene wordnet. V: VETULANI, Zygmunt (ur.), USZKOREIT, Hans (ur.). Human language technology : challenges of the information society, (Lecture notes in computer science, ISSN 0302-9743, Lecture notes in artificial intelligence, 5603).
FIŠER, Darja, SAGOT, Benoît. Combining multiple resources to build reliable wordnets. V: SOJKA, Petr (ur.). Text, speech and dialogue: proceedings, (Lecture notes in computer science, ISSN 0302-9743), (Lecture notes in artifical intelligence, 5346).
Also: http://www.lrec-conf.org/proceedings/lrec2014/pdf/121_Paper.pdf http://www.aclweb.org/anthology/W09-4202
http://opus.lingfil.uu.se/ http://www.statmt.org/europarl/
How can we use them?