Hyperparameters used for SCIERC

QingkaiZeng commented 4 years ago

Hi, I have run your code on SCIERC and I can only achieve micro-f1 as 67.6. So could you show me how to tune your model to reproduce your result reported in paper on SCIERC? (Micro-F1 70.33)

markus-eberts commented 4 years ago

Hi, we trained SpERT on the combined train+dev dataset after hyperparameter tuning. To do so, set 'data/datasets/scierc/scierc_train_dev.json' as 'train_path' in the config file. However, this does only make a minor (if any) difference on SciERC as opposed to just training on train - not a difference of 2.7 F1. Of course results may vary by a bit, that's why we report the average of 5 runs in the paper. However, 67.6 is pretty low, so you either had (very) bad luck wit the run or something is wrong with your configuration. Could you please post the content of your configuration file? Also, which PyTorch version are you using?

QingkaiZeng commented 4 years ago

Sure, the version of pytorch is 1.4.0. I used scibert-scivocab-uncased for PyTorch HuggingFace Models. Other hyperparameters are as follow: label = scierc_train model_type = spert model_path = scibert_scivocab_uncased/weights tokenizer_path = scibert_scivocab_uncased/weights train_path = data/datasets/scierc/scierc_train.json valid_path = data/datasets/scierc/scierc_dev.json test_path = data/datasets/scierc/scierc_test.json types_path = data/datasets/scierc/scierc_types.json train_batch_size = 2 eval_batch_size = 1 neg_entity_count = 100 neg_relation_count = 100 epochs = 200 lr = 5e-5 lr_warmup = 0.1 weight_decay = 0.01 max_grad_norm = 1.0 rel_filter_threshold = 0.4 size_embedding = 25 prop_drop = 0.1 max_span_size = 10 store_predictions = true store_examples = true sampling_processes = 4 sampling_limit = 100 max_pairs = 1000 final_eval = false log_path = data/log/ save_path = data/save/

markus-eberts commented 4 years ago

First, please set epochs to 20. This should be more than enough and is also the number we use in the paper. Also, we use 'SciBERT Cased' (not uncased) as the encoder.

More important, 'test_path' is not used during training. So with your current configuration, you are training on 'scierc_train.json' and evaluating on 'scierc_dev.json'. To reproduce the results in the paper, set 'valid_path' to 'scierc_test.json'. There is no early stopping or anything, the valid file is just evaluated after every epoch (or only after the last epoch if 'final_eval = true'). And as mentioned above, you may also set 'train_path' to 'scierc_train_dev.json'.

A second option is to use your already trained model with the evaluation script. This is the same evaluation procedure as during training. Your model should have been saved in 'save_path' after training. So in order to only evaluate your trained model, see 'example_eval.conf' and change the 'model_path' and 'tokenizer_path' to point to your trained model. Also set 'dataset_path' to 'scierc_test.json' and 'types_path' to 'scierc_types.json'. After that you can evaluate your trained model on the test dataset with 'python ./spert.py eval --config configs/example_eval.conf'. However, since you changed the number of epochs and used another encoder, your results may vary - so maybe it's better to retrain SpERT with the correct configuration.

markus-eberts commented 4 years ago

Please leave a comment if this issue is still unsolved.

lavis-nlp / spert

Hyperparameters used for SCIERC #14