Open marc88 opened 5 years ago
@marc88 I have same issue here. Did you fix the successfully ?
I figure it out.
In the test set, the conll file name ends with gold_parse_conll
instead of gold_conll
. So you need to change the line
[y for x in os.walk(FLAGS.in_file) for y in glob(os.path.join(x[0], '*_gold_conll'))\
if "/"+data_type+"/" in y and "/english/" in y]
with
[y for x in os.walk(FLAGS.in_file) for y in glob(os.path.join(x[0], '*_gold_parse_conll'))\
if "/"+data_type+"/" in y and "/english/" in y]
for test set.
Hello, Can anyone suggest on the data processing to be done on conll2012 before calling the following?
./bin/preprocess.sh conf/ontonotes/dilated-cnn.conf
Currently, simply calling the preprocess.sh script as above, does not write anything to the file mentioned below and goes into an infinite loop I suppose.data/vocabs/ontonotes_cutoff_4.txt
I've downloaded the train v4, dev v4 and test v9 tarballs from http://conll.cemantix.org/2012/data.html
Edit: I could convert the ontonotes files successfully to conll format but not sure of the directory structure to trigger the preprocessing script. Can you help? The following is my directory structure:
$DILATED_CNN_NER_ROOT/data/conll-formatted-ontonotes-5.0
structure for $DILATED_CNN_NER_ROOT/data/conll-formatted-ontonotes-5.0* ( this directory has all the _gold_conll files. Please take a direcotry below as an example: /home/ss06886910/Strubel_IDCNN/data/conll-formatted-ontonotes-5.0/data/train/data/english/annotations/wb/c2e/00/c2e_0028.v4_gold_conll)**
Tried running with the following parameter in ontonotes.conf ;
export raw_data_dir="$DATA_DIR/conll-formatted-ontonotes-5.0/data"
($DATA_DIR = $DILATED_CNN_NER_ROOT/data)And, I get the following error:
Regards