saad120 / dkpro-wsd

Automatically exported from code.google.com/p/dkpro-wsd
0 stars 0 forks source link

Using SemCorXMLReader to read Seneval-3 data. #58

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The problem:
Using SemCorXMLReader to read Seneval-3 data.

Seneval-3 data: 
http://www.cse.unt.edu/~rada/senseval/senseval3/data/EnglishAW/EnglishAW.test.ta
r.gz

SemCorXMLReader: 
https://code.google.com/p/dkpro-wsd/source/browse/trunk/de.tudarmstadt.ukp.dkpro
.wsd.io/src/main/java/de/tudarmstadt/ukp/dkpro/wsd/io/reader/SemCorXMLReader.jav
a

What is the expected output? What do you see instead?
I see an error "The XML parser encountered an unknown element type: corpus." 
thrown by uima.collection.CollectionException.

Additional information:
Can anyone tell if SemCorXMLReader is actually capable of reading this data? 
Any alternatives? 

Original issue reported on code.google.com by alot...@gmail.com on 27 Jun 2014 at 12:22

GoogleCodeExporter commented 9 years ago
Looks like a problem with the input files, not with DKPro WSD.  Please try 
patching your input files as described at 
https://code.google.com/p/dkpro-wsd/wiki/FAQ#Why_am_I_getting_a_lot_of_errors_or
_warnings_when_reading_in_the and reopen this issue if that doesn't work.

Original comment by tristan.miller@nothingisreal.com on 27 Jun 2014 at 12:38

GoogleCodeExporter commented 9 years ago
This is solved by using the script "fix_mihalcea_senseval3.sh" to convert the 
files to readable xmls. 

Data location: http://www.cse.unt.edu/~rada/downloads.html#sensevalsemcor

Thanks, Tristan!

Original comment by alot...@gmail.com on 27 Jun 2014 at 12:54