nyu-mll / jiant-v1-legacy

The jiant toolkit for general-purpose text understanding models
MIT License
21 stars 9 forks source link

[CLOSED] [ZeroDivisionError at ontonotes dataset in edge-probing task] #950

Closed jeswan closed 4 years ago

jeswan commented 4 years ago

Issue by minstar Monday Nov 04, 2019 at 04:24 GMT Originally opened as https://github.com/nyu-mll/jiant/issues/950


Hi, I have some problem with preprocessing on edge probing tasks. I have correctly downloaded the "conll-formatted-ontonotes-5.0". However, in the preprocessing step, stats["count"] is zero so I cannot proceed and below is my error log.

File "/path-to-jiant/probing/data/utils.py", line 83, in to_series
    s["token.mean_count"] = stats["token.count"] / stats["count"] 
ZeroDivisionError: division by zero

by following all README.md in data edge-probing/data folder, I think allennlp's ontonotes dataset_iterator doesn't working. Thus, it doesn't generate any sentences.

Is there any solution I can solve this?

jeswan commented 4 years ago

Comment by liebscher Tuesday Nov 05, 2019 at 04:23 GMT


I am getting the same error despite following the readme as close as possible.

Just pulled the repo the other day.

jeswan commented 4 years ago

Comment by pruksmhc Tuesday Nov 05, 2019 at 10:59 GMT


@iftenney

jeswan commented 4 years ago

Comment by iftenney Tuesday Nov 05, 2019 at 21:35 GMT


I'm out of the office now, but will debug this later this week.

jeswan commented 4 years ago

Comment by minstar Friday Nov 22, 2019 at 04:20 GMT


Is there any change that I can try this now??

jeswan commented 4 years ago

Comment by iftenney Friday Nov 22, 2019 at 05:22 GMT


Sorry for the delay. Working on this, if you need a copy in the mean time please email me (iftenney -at- gmail).

jeswan commented 4 years ago

Comment by iftenney Friday Nov 22, 2019 at 05:40 GMT


Can you give the exact command you used, and any output before the error message?

I'm not able to reproduce. Tested the following:

# set up a fresh environment
git clone --recursive git@github.com:nyu-mll/jiant.git jiant
cd jiant
conda env create -f environment.yml
conda activate jiant

# process each OntoNotes task
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=coref -o /tmp/onto_coref
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=ner -o /tmp/onto_ner
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=const -o /tmp/onto_const
python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=srl -o /tmp/onto_srl

and didn't see a crash.

jeswan commented 4 years ago

Comment by minstar Friday Nov 22, 2019 at 07:11 GMT


My allennlp version is same as yours and I have changed my path to ontonotes at the get_and_process_all_data.sh as you did. Also, my conll-formatted-ontonotes-5.0 contains data/conll-2012-test, data/development, data/train, data/test. Each data file has annotations of bc, bn, mz, nw, pt, tc, wb folder. However, I tried the first extract_ontonotes_all.py codes which is python probing/data/extract_ontonotes_all.py --ontonotes ~/data/conll-formatted-ontonotes-5.0 --tasks=coref -o /tmp/onto_coref but it shows same error as above.

jeswan commented 4 years ago

Comment by iftenney Sunday Nov 24, 2019 at 22:49 GMT


Does your OntoNotes folder have the .conll files?

cd conll-formatted-ontonotes-5.0
ls data/train/data/english/annotations/bc/cctv/00

Should look something like:

cctv_0001.gold_conll  cctv_0002.gold_skel   cctv_0004.gold_conll
cctv_0001.gold_skel   cctv_0003.gold_conll  cctv_0004.gold_skel
cctv_0002.gold_conll  cctv_0003.gold_skel

The LDC corpus doesn't include the .conll files by default, they have to be generated by step 3 from http://cemantix.org/data/ontonotes.html.

(Also see the AllenNLP corpus reader documentation at https://github.com/allenai/allennlp/blob/v0.8.4/allennlp/data/dataset_readers/dataset_utils/ontonotes.py#L83)

jeswan commented 4 years ago

Comment by minstar Monday Nov 25, 2019 at 07:12 GMT


Oh.. now I solved my problem Thanks a lot!!

jeswan commented 4 years ago

Comment by iftenney Monday Nov 25, 2019 at 19:57 GMT


Great, glad that helped! Please let me know if you have any other questions.

jeswan commented 4 years ago

Comment by Raghava14 Thursday Dec 19, 2019 at 17:08 GMT


Hi I too ran into the same issue. I am trying to do step 3 from here http://cemantix.org/data/ontonotes.html But from where can I download the scripts skeleton2conll.sh The download link in the page is either not there or seems broken. Is there any solution ?

jeswan commented 4 years ago

Comment by Raghava14 Thursday Dec 19, 2019 at 17:16 GMT


Okay so after some search found the scripts here seems the download link is broken in the website.

jeswan commented 4 years ago

Comment by Raghava14 Saturday Dec 21, 2019 at 13:07 GMT


Hi, I found the scripts but I am running into an issue saying could not find the gold parse [conll-formatted-ontonotes-5.0/data/train/data/english/annotations/mz/sinorama/10/ectb_1031.parse] in the ontonotes distribution ... exiting ...

These are steps I did

  1. Dowloaded v12 from here

  2. Uncompressed the file to get conll-formatted-ontonotes-5.0-12 which had conll-formatted-ontonotes-5.0

  3. I didls conll-formatted-ontonotes-5.0/data/train/data/english/annotations/bc/cctv/00 to get cctv_0001.gold_skel cctv_0002.gold_skel cctv_0003.gold_skel cctv_0004.gold_skel which didn't have the .conll files files so tried following step 3 from http://cemantix.org/data/ontonotes.html by downloading scripts from https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO/tree/master/conll-formatted-ontonotes-5.0/scripts and placing them inconll-formatted-ontonotes-5.0-12/scriptsfolder .

4.Tried runningbash conll-formatted-ontonotes-5.0/scripts/skeleton2conll.sh -D conll-formatted-ontonotes-5.0/data/train/data/ conll-formatted-ontonotes-5.0/but am getting the above error. Am I doing any wrong here ? Please help @minstar

jeswan commented 4 years ago

Comment by lovodkin93 Tuesday Mar 31, 2020 at 22:36 GMT


so i used the scripts Raghava14 provided, which were described in http://cemantix.org/data/ontonotes.html, and got the following error:

File "../../conll-formatted-ontonotes-5.0/scripts/skeleton2conll.py", line 392 except InvalidSexprException, e: ^ SyntaxError: invalid syntax Exit code: 1 ./skeleton2conll.sh: line 93: break: only meaningful in a for',while', or `until' loop -> python ../../conll-formatted-ontonotes-5.0/scripts/skeleton2conll.py ../../../ontonotes-release-5.0/data/files/data/english/annotations/mz/sinorama/10/ectb_1029.parse ../../conll-formatted-ontonotes-5.0/data/conll-2012-test/data/english/annotations/mz/sinorama/10/ectb_1029.gold_skel ../../conll-formatted-ontonotes-5.0/data/conll-2012-test/data/english/annotations/mz/sinorama/10/ectb_1029.gold_conll -edited --text

has anyone encountered this error and can help me solve it? Thanks! @iftenney