Closed minstar closed 4 years ago
I am getting the same error despite following the readme as close as possible.
Just pulled the repo the other day.
@iftenney
I'm out of the office now, but will debug this later this week.
Is there any change that I can try this now??
Sorry for the delay. Working on this, if you need a copy in the mean time please email me (iftenney -at- gmail).
Can you give the exact command you used, and any output before the error message?
I'm not able to reproduce. Tested the following:
# set up a fresh environment
git clone --recursive git@github.com:nyu-mll/jiant.git jiant
cd jiant
conda env create -f environment.yml
conda activate jiant
# process each OntoNotes task
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=coref -o /tmp/onto_coref
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=ner -o /tmp/onto_ner
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=const -o /tmp/onto_const
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=srl -o /tmp/onto_srl
and didn't see a crash.
get_and_process_all_data.sh
, did you set the path to ontonotes at the beginning of the script? conll-formatted-ontonotes-5.0
) directory contain?allennlp
do you have installed? (I tested with 0.8.4 per environment.yml)My allennlp version is same as yours and I have changed my path to ontonotes at the get_and_process_all_data.sh
as you did.
Also, my conll-formatted-ontonotes-5.0 contains data/conll-2012-test, data/development, data/train, data/test
.
Each data file has annotations of bc, bn, mz, nw, pt, tc, wb folder.
However, I tried the first extract_ontonotes_all.py codes which is
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=coref -o /tmp/onto_coref
but it shows same error as above.
Does your OntoNotes folder have the .conll
files?
cd conll-formatted-ontonotes-5.0
ls data/train/data/english/annotations/bc/cctv/00
Should look something like:
cctv_0001.gold_conll cctv_0002.gold_skel cctv_0004.gold_conll
cctv_0001.gold_skel cctv_0003.gold_conll cctv_0004.gold_skel
cctv_0002.gold_conll cctv_0003.gold_skel
The LDC corpus doesn't include the .conll
files by default, they have to be generated by step 3 from http://cemantix.org/data/ontonotes.html.
(Also see the AllenNLP corpus reader documentation at https://github.com/allenai/allennlp/blob/v0.8.4/allennlp/data/dataset_readers/dataset_utils/ontonotes.py#L83)
Oh.. now I solved my problem Thanks a lot!!
Great, glad that helped! Please let me know if you have any other questions.
Hi I too ran into the same issue. I am trying to do step 3 from here http://cemantix.org/data/ontonotes.html But from where can I download the scripts skeleton2conll.sh
The download link in the page is either not there or seems broken. Is there any solution ?
Okay so after some search found the scripts here seems the download link is broken in the website.
Hi, I found the scripts but I am running into an issue saying
could not find the gold parse [conll-formatted-ontonotes-5.0/data/train/data/english/annotations/mz/sinorama/10/ectb_1031.parse] in the ontonotes distribution ... exiting ...
These are steps I did
Dowloaded v12 from here
Uncompressed the file to get conll-formatted-ontonotes-5.0-12
which had conll-formatted-ontonotes-5.0
I didls conll-formatted-ontonotes-5.0/data/train/data/english/annotations/bc/cctv/00
to get
cctv_0001.gold_skel cctv_0002.gold_skel cctv_0003.gold_skel cctv_0004.gold_skel
which didn't have the .conll files files so tried following step 3 from http://cemantix.org/data/ontonotes.html by downloading scripts from https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO/tree/master/conll-formatted-ontonotes-5.0/scripts and placing them inconll-formatted-ontonotes-5.0-12/scripts
folder .
4.Tried runningbash conll-formatted-ontonotes-5.0/scripts/skeleton2conll.sh -D conll-formatted-ontonotes-5.0/data/train/data/ conll-formatted-ontonotes-5.0/
but am getting the above error. Am I doing any wrong here ? Please help
@minstar
so i used the scripts Raghava14 provided, which were described in http://cemantix.org/data/ontonotes.html, and got the following error:
File "../../conll-formatted-ontonotes-5.0/scripts/skeleton2conll.py", line 392
except InvalidSexprException, e:
^
SyntaxError: invalid syntax
Exit code: 1
./skeleton2conll.sh: line 93: break: only meaningful in a for',
while', or `until' loop
-> python ../../conll-formatted-ontonotes-5.0/scripts/skeleton2conll.py ../../../ontonotes-release-5.0/data/files/data/english/annotations/mz/sinorama/10/ectb_1029.parse ../../conll-formatted-ontonotes-5.0/data/conll-2012-test/data/english/annotations/mz/sinorama/10/ectb_1029.gold_skel ../../conll-formatted-ontonotes-5.0/data/conll-2012-test/data/english/annotations/mz/sinorama/10/ectb_1029.gold_conll -edited --text
has anyone encountered this error and can help me solve it? Thanks! @iftenney
Hi, I have some problem with preprocessing on edge probing tasks. I have correctly downloaded the "conll-formatted-ontonotes-5.0". However, in the preprocessing step, stats["count"] is zero so I cannot proceed and below is my error log.
by following all README.md in data edge-probing/data folder, I think allennlp's ontonotes dataset_iterator doesn't working. Thus, it doesn't generate any sentences.
Is there any solution I can solve this?