jbjorne / TEES

Turku Event Extraction System
147 stars 44 forks source link

Converting DDI13 corpus. #6

Closed jaredrox closed 8 years ago

jaredrox commented 11 years ago

Hi,

I am finding errors while installing the corpus. If I try to do it from the configure program, I found the first error log. If I try from convertDDI13.py, I found the second.

Is the any known issue about this? Currently I am using Ubuntu 12.04 (the second log seems to be referred to the file system).

Thanks in advance, Jared.

First log:

Log opened at Thu Mar 14 18:29:44 2013

Command line: configure.py [18:29:44 14/03] ======================= Converting DDI'13 corpus ======================= [18:29:44 14/03] --------------- Downloading DDI'13 Shared Task files --------------- [18:29:44 14/03] Downloading file http://www.cs.york.ac.uk/semeval-2013/task9/data/uploads/datasets/train/semeval_task9_train.zip to /home/ignacio/.tees/corpora/download/semeval_task9_train.zip [18:29:46 14/03] Extracting package /home/ignacio/.tees/corpora/download/semeval_task9_train.zip [18:29:52 14/03] Redirected to http://heanet.dl.sourceforge.net/project/tees/data/DDI13-TEES-parses-130224.tar.gz [18:29:52 14/03] Downloading file http://heanet.dl.sourceforge.net/project/tees/data/DDI13-TEES-parses-130224.tar.gz to /home/ignacio/.tees/corpora/download/DDI13-TEES-parses-130224.tar.gz [18:29:57 14/03] Extracting package /home/ignacio/.tees/corpora/download/DDI13-TEES-parses-130224.tar.gz [18:29:58 14/03]
[18:29:58 14/03] * Exception processing menu 'Corpora' option 'i (Install) * [18:29:58 14/03] Exception: 'bool' object is not iterable [18:29:58 14/03] Traceback (most recent call last): [18:29:58 14/03] File "/media/DATA/Ingeniería Linguística/TEES/TEES/Utils/Menu.py", line 229, in _runHandler [18:29:58 14/03] handler(*handlerArgs) [18:29:58 14/03] File "/media/DATA/Ingeniería Linguística/TEES/TEES/Utils/Convert/convertDDI13.py", line 137, in convertDDI13 [18:29:58 14/03] for dataset in datasets: [18:29:58 14/03] TypeError: 'bool' object is not iterable

Second log:

Log opened at Mon Apr 1 14:30:22 2013

Command line: convertDDI13.py [14:30:22 01/04] ======================= Converting DDI'13 corpus ======================= [14:30:22 01/04] --------------- Downloading DDI'13 Shared Task files --------------- [14:30:22 01/04] Skipping already downloaded file http://www.cs.york.ac.uk/semeval-2013/task9/data/uploads/datasets/train/semeval_task9_train.zip [14:30:22 01/04] Extracting package /home/ignacio/.tees/corpora/download/semeval_task9_train.zip [14:30:24 01/04] Redirected to http://surfnet.dl.sourceforge.net/project/tees/data/DDI13-TEES-parses-130224.tar.gz [14:30:24 01/04] Skipping already downloaded file http://sourceforge.net/projects/tees/files/data/DDI13-TEES-parses-130224.tar.gz [14:30:24 01/04] Extracting package /home/ignacio/.tees/corpora/download/DDI13-TEES-parses-130224.tar.gz [14:30:24 01/04] Merging input XMLs [14:30:24 01/04] Processing elements [14:30:24 01/04] Dividing training set into folds [14:30:24 01/04] Inserting McCC parses [14:30:24 01/04] Loading corpus <ElementTree object at 0x13d1d50> [14:30:24 01/04] Corpus file loaded [14:30:24 01/04] Inserting parses from ['/tmp/tmpIZvioo/DDI13-TEES-parses-130224'] [14:30:25 01/04] Traceback (most recent call last): [14:30:25 01/04] File "convertDDI13.py", line 205, in [14:30:25 01/04] convertDDI13(options.outdir, options.downloaddir, options.datasets, options.redownload, not options.noparses, options.parse, options.intermediateFiles, options.debug) [14:30:25 01/04] File "convertDDI13.py", line 159, in convertDDI13 [14:30:25 01/04] Tools.BLLIPParser.insertParses(corpusTree, downloaded[dataset + "_TEES_PARSES"], None, extraAttributes={"source":"TEES"}) [14:30:25 01/04] File "/home/ignacio/TEES/Tools/BLLIPParser.py", line 342, in insertParses [14:30:25 01/04] assert os.path.exists(parsePath) [14:30:25 01/04] File "/usr/lib/python2.7/genericpath.py", line 18, in exists [14:30:25 01/04] os.stat(path) [14:30:25 01/04] TypeError: coercing to Unicode: need string or buffer, list found

jbjorne commented 11 years ago

Dear Jared,

Sorry about the conversion issues, at this time please use the preconverted corpora for DDI13, available from http://sourceforge.net/projects/tees/files/analyses/ for the training set and from the SemEval FTP server for the test set. There are some problems with the current conversion script, as it was quickly updated to handle the test data. Once the test data becomes available (or we know whether it will become available) the conversion script will be fixed.

Best Regards, Jari