iesl / dilated-cnn-ner

Dilated CNNs for NER in TensorFlow
243 stars 60 forks source link

Make the code compatible with conll-2012 directory structure #12

Closed ghaddarAbs closed 6 years ago

ghaddarAbs commented 6 years ago

Make the preprocessing of ontonotes compatible with the directory structure produced by skeleton2conll.sh script of conll-2012 shared task

kamalkraj commented 6 years ago

fails to pre-process data from test folder. I think it is because the v9 test doesn't contain //_gold_conll files

ghaddarAbs commented 6 years ago

You must use the v4 train/dev/test split. Here are the stats:

num train examples: 59924 num train tokens: 1088503

num dev examples: 8262 num dev tokens: 152728

kamalkraj commented 6 years ago

V4 test split is not available in the website , can you share here

ghaddarAbs commented 6 years ago

Unfortunately i can't.... the dataset is copyrighted!!! However you can download ontonotes from ldc (its free!!!) Then you have to follow the instructions of the conll-2012 shared task data preprocessing

kamalkraj commented 6 years ago

I have access to ontonotes 5.0 In the website conll-2012 shared task data preprocessing only test v9 is avialble test

ghaddarAbs commented 6 years ago

What you are looking for is Test Key directly below Test Data

kamalkraj commented 6 years ago

got it.Thanks

Do you have any scripts that can extract all sentences and Labels from complete Ontonotes 5.0 dataset ?

ghaddarAbs commented 6 years ago

The scripts are in the same page

strubell commented 6 years ago

Thanks!