manliu1225 / mrc-for-flat-nested-ner

13 stars 3 forks source link

train problem 卡在以下最后一行内容current training loss is : 0.01826123334467411 不动 #1

Open gjy-code opened 2 years ago

gjy-code commented 2 years ago

您好,以下是我的运行内容:(onto4 数据集)

train_zh_onto.sh: line 7: EXP-ID=22_1: command not found Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex. Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex. Please notice that merge the args_dict and json_config ... ... { "bert_frozen": "false", "hidden_size": 768, "hidden_dropout_prob": 0.2, "classifier_sign": "multi_nonlinear", "clip_grad": 1, "bert_config": { "attention_probs_dropout_prob": 0.1, "directionality": "bidi", "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "max_position_embeddings": 512, "num_attention_heads": 12, "num_hidden_layers": 12, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "type_vocab_size": 2, "vocab_size": 21128 }, "config_path": "/home/amax/work/gjy/mrcc/config/zh_bert.json", "data_dir": "/home/amax/work/gjy/mrcc/data_preprocess/example/zh_ontonotes4", "bert_model": "/home/amax/work/gjy/Bert-Ner-Demo-master/chinese_L-12_H-768_A-12", "task_name": null, "max_seq_length": 100, "train_batch_size": 16, "dev_batch_size": 16, "test_batch_size": 16, "checkpoint": 600, "learning_rate": 8e-06, "num_train_epochs": 10, "warmup_proportion": -1.0, "local_rank": -1, "gradient_accumulation_steps": 1, "seed": 2333, "export_model": true, "output_dir": "/home/amax/work/gjy/mrcc/export/zh_onto/mrc-ner-zh_onto-2020-05-12--100-8e-6-16-0.3", "data_sign": "zh_onto", "weight_start": 1.0, "weight_end": 1.0, "weight_span": 1.0, "entity_sign": "flat", "n_gpu": 1, "dropout": 0.3, "entity_threshold": 0.5, "data_cache": true }

current data_sign: zh_onto

loading train data ... ... 62896 62896 train data loaded

loading dev data ... ... 17204 17204 dev data loaded

loading test data ... ... 17384 17384 test data loaded 数据已加载完毕! load_model模型已加载完毕! ###################################################################### EPOCH: 0 -----------------------------*- current training loss is : 0.01826123334467411

my problem: 只能进行到以上步骤,一一直在最后一行不动。

AlexXx-Wu commented 2 years ago

可能是在 flat_ner_performance 的时候,在 bmes_decode 函数中陷入了死循环,可以打印下test集看一下,我之前也遇到了这个问题,能否留下邮箱,我这边训练完结果都是‘O’标签

gjy-code commented 2 years ago

@.***

在 2022-01-07 18:32:57,"AlexXx-Wu" @.***> 写道:

可能是在 flat_ner_performance 的时候,在 bmes_decode 函数中陷入了死循环,可以打印下test集看一下,我之前也遇到了这个问题,能否留下邮箱,我这边训练完结果都是‘O’标签

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>