Closed stranak closed 6 years ago
Unfortunately the download contains only 40-50 sentences per language, while documentation mentions 2000+ sentences :-( I downloaded the data recently and tried to contact the author about it but got no reply.
Hm, that is very unfortunate indeed. It is barely worth the work for 50 sentences, even if it may be rather little work, given the Tiger XML format. For 2000 it would be definitely worth it.
Will you write the author and ask him about it?
Go to /net/data/treebanks/GRUG and try grep '<s ' GEO/*.tig | wc -l GEO: 45 GER: 45 RUS: 46 UKR: 50
I wrote to Oleg Kapanadze on March 30 and asked whether bigger data was available but he has not replied yet.
http://fedora.clarin-d.uni-saarland.de/grug/, it is CC-BY, Tiger XML format.