Closed 597477803 closed 4 years ago
The MSRA dataset for CWS is available at the official website (http://sighan.cs.uchicago.edu/bakeoff2005/), and due to the copyright, we could not provide it to you. For your reference, the data format is like this: 扬 B 帆 E 远 B 东 E 做 S 与 S 中 B 国 E 合 B 作 E 的 S 先 B 行 E
python run_token_level_classification.py \ --task_name cwsmsra \ --do_train \ --do_eval \ --do_lower_case \ --data_dir data/msra_ner \ --bert_model data/ZEN_pretrain_base_v0.1.0 \ --max_seq_length 256 \ --do_train \ --do_eval \ --train_batch_size 96 \ --num_train_epochs 30 \ --warmup_proportion 0.1
比如,想进行上面的finetune,但是这个任务cwsmsra,使用的训练数据格式应该是怎样的,从哪里能比较方便获取到?