troublemaker-r / Chinese_Coreference_Resolution

基于SpanBert的中文指代消解,pytorch实现
95 stars 20 forks source link

调整参数和coref官方一样,但是F1指标上不了70,可能是什么问题? #21

Open learner-crapy opened 1 year ago

learner-crapy commented 1 year ago

我使用了一块RTX4090分别训练了中文和英文的OntoNote数据,得到下面的结果 中文:RoBERTa_zh_L12_PyTorch
image

英文:spanbert_base
image

使用参数如下: `# Computation limits. max_top_antecedents = 50 max_training_sentences = 11 top_span_ratio = 0.4 max_num_speakers = 20 max_segment_len = 128

Learning

bert_learning_rate = 1e-05 task_learning_rate = 0.0002 adam_eps = 1e-6 dropout_rate = 0.3

Task choice

num_docs = 2802 num_epochs = 30 do_train = true do_eval = true do_test = true do_one_example_test = true eval_frequency = 100 report_frequency = 10

Model hyperparameters.

genres = ["bc", "bn", "mz", "nw", "tc", "wb"] coref_depth = 2 ffnn_size = 3000 feature_size = 20 max_span_width = 30 use_metadata = true use_features = true use_segment_distance = true model_heads = true fine_grained = true use_prior = true single_example = true`

但是我看coref官方使用base版本在英文预料下的F1指标达到77,请问可能的问题在哪里? image