juntaoy / biaffine-ner

Named Entity Recognition as Dependency Parsing
Apache License 2.0
347 stars 40 forks source link

Running with fixed embeddings only #6

Open amir-zeldes opened 4 years ago

amir-zeldes commented 4 years ago

Hi and thanks for putting this code up! Is there a way to run the model with only fixed word embeddings, like glove, fasttext etc., but without bert?

juntaoy commented 4 years ago

If you comment out line 197-207 in biaffine_ner_model.py and replace line 23 with self.lm_file=None it should work without Bert

amir-zeldes commented 4 years ago

Thanks for the quick reply! This runs successfully now, but despite playing around with hyperparameters I can't get it to predict anything but the 'O' category for anything. To give some context, I'm trying to get baseline numbers for various neural systems for nested NER in a low resource setting for span detection. I'm not trying to get high numbers, but ideally non-zero :)

I have:

Loss is decreasing throughout training, but predictions on dev are always O. Oddly, even if I feed train as eval_path again, it still only predicts O in candidate_ner_scores in evaluate. Any ideas would be appreciated!

Here is my config (I tried making the model really small here to get a non-zero result, but I've played with various values):

test = ${base}{
  train_path = train.jsonlines
  lm_path = xyz
  eval_path = dev.jsonlines
  test_path = test.jsonlines
  ner_types = ["thing"]
  char_vocab_path = "char_vocab.txt"
  context_embeddings = ${w2v_50d}
  lm_size = 1
  lm_layers = 1
  flat_ner = false
  contextualization_size = 40
  contextualization_layers = 1

  eval_frequency = 150
  report_frequency = 50
  log_root = logs
  max_step = 8000

  lstm_dropout_rate = 0.2
  lexical_dropout_rate = 0.2
  dropout_rate = 0.2
  learning_rate = 0.001
  ffnn_size = 30
  ffnn_depth = 1
  char_embedding_size = 4

}

And here is some training output with decreasing loss, but 0 f-score on dev:

[50] loss=5979.44, steps/s=25.96
[100] loss=6474.12, steps/s=28.04
[150] loss=5528.80, steps/s=29.18
Loaded 13 eval examples.
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 28924.54 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[150] evaL_f1=0.00, max_f1=0.00 at step 0
[200] loss=4291.26, steps/s=27.66
[250] loss=4213.31, steps/s=28.87
[300] loss=4266.32, steps/s=29.62
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 32010.57 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[300] evaL_f1=0.00, max_f1=0.00 at step 0
[350] loss=4609.43, steps/s=28.40
[400] loss=3626.02, steps/s=28.74
[450] loss=3158.52, steps/s=29.28
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 33300.01 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[450] evaL_f1=0.00, max_f1=0.00 at step 0
[500] loss=3116.23, steps/s=28.55
[550] loss=2190.09, steps/s=29.40
[600] loss=3685.80, steps/s=28.79
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 33556.00 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[600] evaL_f1=0.00, max_f1=0.00 at step 0
[650] loss=1716.10, steps/s=28.78
[700] loss=2771.52, steps/s=28.65
[750] loss=1920.65, steps/s=28.89
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 32705.53 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[750] evaL_f1=0.00, max_f1=0.00 at step 0
[800] loss=1655.52, steps/s=28.60
[850] loss=1689.42, steps/s=28.73
[900] loss=1852.03, steps/s=28.62
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 32728.62 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
[900] evaL_f1=0.00, max_f1=0.00 at step 0
[950] loss=1338.44, steps/s=28.34
[1000] loss=1473.17, steps/s=28.32
[1050] loss=1163.35, steps/s=28.45
Evaluated 1/13 examples.
Evaluated 11/13 examples.
Time used: 0 second, 30699.79 w/s
Mention F1: 0.00%
Mention recall: 0.00%
Mention precision: 0.00%
amir-zeldes commented 4 years ago

Never mind, I figured it out - BTW I ported this to TF2.2 with tf.compat.v1 and Python 3.X, I can push it to a fork/PR if you're interested

juntaoy commented 4 years ago

Hi, sorry for late reply:) may I ask how did you solve the all ‘O’ problem? I haven’t this problem in all my experiments. I assume it might because of the size of the corpus. Did you do undersampling?

For the updated code feel free to push them and if you could attach here the address to your repository, so people want to use them can find it easily:) thanks.

amir-zeldes commented 4 years ago

OK, my changes are now in the PR in #9

You can also see the low resource parameters I used in the experiments.conf, I'm getting F1=0.757 for span detection (no entity type classification). I'm comparing it to using syntax tree based spans (predicting what should be a candidate for type classification using a normal dependency parser), which currently gets 82.3 for predicted POS tags and parses, and 87 for gold parses.

juntaoy commented 4 years ago

Thanks a lot, Amir, I've include a link in the readme for people to find your tf2.0 ready code. For the span detection in the under-resourced case, you might want to use undersampling by masking out some of a large portion of negative during the training say give T (e.g. 5) negative examples per positive example. You can do this by simply add a new boolean placeholder (us_masks) same shape as gold_labels modify your code 135-145 as:

   us_ratio = config['under_sampling_ratio'] #can be calculated by T * num_positive_example/num_negative_example
   gold_labels = []
   us_masks = []
   for sid, sent in enumerate(sentences):
       ner = {(s,e):self.ner_maps[t] for s,e,t in ners[sid]}
       for s in range(len(sent)):
         for e in range(s,len(sent)):
           label = ner.get((s,e),0)  if is_training else 0
           gold_labels.append(label)
           mask = (random.rand() < us_ratio if label == 0 else True) if is_traning else True
           us_masks.append(mask)
   us_masks = np.array(us_masks)  
   gold_labels = np.array(gold_labels)

    example_tensors = (tokens, context_word_emb,lm_emb, char_index, text_len, is_training, gold_labels,us_masks)

And before compute the loss in line 246:

candidate_ner_scores = tf.boolean_mask(candidate_ner_scores, us_masks)
gold_labels = tf.boolean_mask(gold_labels,us_masks)

I find this method very helpful when dealing with under-resourced cases, (I did for other task using similar architecture)