Closed jeswan closed 4 years ago
Comment by liebscher Tuesday Nov 05, 2019 at 04:23 GMT
I am getting the same error despite following the readme as close as possible.
Just pulled the repo the other day.
Comment by iftenney Tuesday Nov 05, 2019 at 21:35 GMT
I'm out of the office now, but will debug this later this week.
Comment by iftenney Friday Nov 22, 2019 at 05:22 GMT
Sorry for the delay. Working on this, if you need a copy in the mean time please email me (iftenney -at- gmail).
Comment by iftenney Friday Nov 22, 2019 at 05:40 GMT
Can you give the exact command you used, and any output before the error message?
I'm not able to reproduce. Tested the following:
# set up a fresh environment
git clone --recursive git@github.com:nyu-mll/jiant.git jiant
cd jiant
conda env create -f environment.yml
conda activate jiant
# process each OntoNotes task
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=coref -o /tmp/onto_coref
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=ner -o /tmp/onto_ner
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=const -o /tmp/onto_const
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=srl -o /tmp/onto_srl
and didn't see a crash.
get_and_process_all_data.sh
, did you set the path to ontonotes at the beginning of the script? conll-formatted-ontonotes-5.0
) directory contain?allennlp
do you have installed? (I tested with 0.8.4 per environment.yml)Comment by minstar Friday Nov 22, 2019 at 07:11 GMT
My allennlp version is same as yours and I have changed my path to ontonotes at the get_and_process_all_data.sh
as you did.
Also, my conll-formatted-ontonotes-5.0 contains data/conll-2012-test, data/development, data/train, data/test
.
Each data file has annotations of bc, bn, mz, nw, pt, tc, wb folder.
However, I tried the first extract_ontonotes_all.py codes which is
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=coref -o /tmp/onto_coref
but it shows same error as above.
Comment by iftenney Sunday Nov 24, 2019 at 22:49 GMT
Does your OntoNotes folder have the .conll
files?
cd conll-formatted-ontonotes-5.0
ls data/train/data/english/annotations/bc/cctv/00
Should look something like:
cctv_0001.gold_conll cctv_0002.gold_skel cctv_0004.gold_conll
cctv_0001.gold_skel cctv_0003.gold_conll cctv_0004.gold_skel
cctv_0002.gold_conll cctv_0003.gold_skel
The LDC corpus doesn't include the .conll
files by default, they have to be generated by step 3 from http://cemantix.org/data/ontonotes.html.
(Also see the AllenNLP corpus reader documentation at https://github.com/allenai/allennlp/blob/v0.8.4/allennlp/data/dataset_readers/dataset_utils/ontonotes.py#L83)
Comment by iftenney Monday Nov 25, 2019 at 19:57 GMT
Great, glad that helped! Please let me know if you have any other questions.
Comment by Raghava14 Thursday Dec 19, 2019 at 17:08 GMT
Hi I too ran into the same issue. I am trying to do step 3 from here http://cemantix.org/data/ontonotes.html But from where can I download the scripts skeleton2conll.sh
The download link in the page is either not there or seems broken. Is there any solution ?
Comment by Raghava14 Saturday Dec 21, 2019 at 13:07 GMT
Hi, I found the scripts but I am running into an issue saying
could not find the gold parse [conll-formatted-ontonotes-5.0/data/train/data/english/annotations/mz/sinorama/10/ectb_1031.parse] in the ontonotes distribution ... exiting ...
These are steps I did
Dowloaded v12 from here
Uncompressed the file to get conll-formatted-ontonotes-5.0-12
which had conll-formatted-ontonotes-5.0
I didls conll-formatted-ontonotes-5.0/data/train/data/english/annotations/bc/cctv/00
to get
cctv_0001.gold_skel cctv_0002.gold_skel cctv_0003.gold_skel cctv_0004.gold_skel
which didn't have the .conll files files so tried following step 3 from http://cemantix.org/data/ontonotes.html by downloading scripts from https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO/tree/master/conll-formatted-ontonotes-5.0/scripts and placing them inconll-formatted-ontonotes-5.0-12/scripts
folder .
4.Tried runningbash conll-formatted-ontonotes-5.0/scripts/skeleton2conll.sh -D conll-formatted-ontonotes-5.0/data/train/data/ conll-formatted-ontonotes-5.0/
but am getting the above error. Am I doing any wrong here ? Please help
@minstar
Comment by lovodkin93 Tuesday Mar 31, 2020 at 22:36 GMT
so i used the scripts Raghava14 provided, which were described in http://cemantix.org/data/ontonotes.html, and got the following error:
File "../../conll-formatted-ontonotes-5.0/scripts/skeleton2conll.py", line 392
except InvalidSexprException, e:
^
SyntaxError: invalid syntax
Exit code: 1
./skeleton2conll.sh: line 93: break: only meaningful in a for',
while', or `until' loop
-> python ../../conll-formatted-ontonotes-5.0/scripts/skeleton2conll.py ../../../ontonotes-release-5.0/data/files/data/english/annotations/mz/sinorama/10/ectb_1029.parse ../../conll-formatted-ontonotes-5.0/data/conll-2012-test/data/english/annotations/mz/sinorama/10/ectb_1029.gold_skel ../../conll-formatted-ontonotes-5.0/data/conll-2012-test/data/english/annotations/mz/sinorama/10/ectb_1029.gold_conll -edited --text
has anyone encountered this error and can help me solve it? Thanks! @iftenney
Issue by minstar Monday Nov 04, 2019 at 04:24 GMT Originally opened as https://github.com/nyu-mll/jiant/issues/950
Hi, I have some problem with preprocessing on edge probing tasks. I have correctly downloaded the "conll-formatted-ontonotes-5.0". However, in the preprocessing step, stats["count"] is zero so I cannot proceed and below is my error log.
by following all README.md in data edge-probing/data folder, I think allennlp's ontonotes dataset_iterator doesn't working. Thus, it doesn't generate any sentences.
Is there any solution I can solve this?