Closed Hairmore closed 4 months ago
@Hairmore Hello,抱歉很晚回复你的问题,.conllx请尽量使用utf8编码,.txt文件有特殊用途,表示纯文本文件
It's fine, No need for apologizing. Very grateful for your work and help!!!!! I have found the reason for this problem. It's because it's trained under Windows. I switched to Linux and this problem disappeared. Thx a lot !!!!!!!!!
@Hairmore Hello,抱歉很晚回复你的问题,.conllx请尽量使用utf8编码,.txt文件有特殊用途,表示纯文本文件
Oh, the "txt" is to solve another weird problem. Under windows, if I have .conllu, that problem pops out. But by adding txt to the end, that problem is gone. Still don't know why
Sorry for using English, I haven't had Chinese input method on my Ubuntu yet.
recommend to use conllu format files with .conllu/.conllx extension on Linux, which is my practice.
recommend to use conllu format files with .conllu/.conllx extension on Linux, which is my practice.
Yes, under Linux with .conllu, everything went smoothly
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.
python -u -m supar.cmds.dep.biaffine train -b -d 0 -c dep-biaffine-xlmr -p model --train train.conllx \ --dev dev.conllx \ --test test.conllx \ -encoder=bert \ --bert=xlm-roberta-large \ --lr=5e-5 \ --lr-rate=20 \ --batch-size=500 \ --epoch=5 \ --update-steps=4 我的数据最开始是conllu格式,直接修改后缀为conllx。在运行这段代码时遇到: “File "supar\models\dep\biaffine\transform.py", line 422, in load for line in lines: UnicodeDecodeError: 'gbk' codec can't decode byte 0x94 in position 39: illegal multibyte sequence” 这个错误在我将文件名进行如是修改 train.conllx --> train.conllx.txt后消失. 开始进行Building the fields Building the model [2023-12-14 19:19:58 INFO] BiaffineDependencyModel( (encoder): TransformerEmbedding(xlm-roberta-large, n_layers=4, n_out=1024, stride=256, pooling=mean, pad_index=1, finetune=True) (encoder_dropout): Dropout(p=0.1, inplace=False) (arc_mlp_d): MLP(n_in=1024, n_out=500, dropout=0.33) (arc_mlp_h): MLP(n_in=1024, n_out=500, dropout=0.33) (rel_mlp_d): MLP(n_in=1024, n_out=100, dropout=0.33) (rel_mlp_h): MLP(n_in=1024, n_out=100, dropout=0.33) (arc_attn): Biaffine(n_in=500, bias_x=True) (rel_attn): Biaffine(n_in=100, n_out=2, bias_x=True, bias_y=True) (criterion): CrossEntropyLoss() ) 但是在caching the data步骤报错:![捕获2](https://github.com/yzhangcs/parser/assets/85640694/d90b33c2-2711-4302-9d44-2c589cab29f2)
不知道是不是文件格式的问题?请问可以请求一份您的训练数据进行测试吗? 我的数据格式为
十分感谢!!!!