yongzhuo / Keras-TextClassification

中文长文本分类、短句子分类、多标签分类、两句子相似度(Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short),字词句向量嵌入层(embeddings)和网络层(graph)构建基类,FastText,TextCNN,CharCNN,TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN
https://blog.csdn.net/rensihui
MIT License
1.77k stars 406 forks source link

自己准备的头条训练数据,在文件夹multi_label_class下运行python train_multi.py报错如下 #64

Closed david0718 closed 3 years ago

david0718 commented 3 years ago

Hi 大神 自己准备的头条训练数据,在文件夹multi_label_class下运行python train_multi.py报错如下:

附件里有训练数据,数据从https://github.com/fate233/toutiao-multilevel-text-classfication-dataset 解压,取部分数据做了处理。 自行处理,处理逻辑为

头部 label|,|ques 其他行为原始数据行 用|,|分割以后, 取第二和第三,然后 第二+|,|+第三 拼成一行。

谢谢~期待您的回复。 或者对于准备训练数据也没有更好的方法。

DY ——————————————————————————————————————————

Traceback (most recent call last): File "train_multi.py", line 87, in train(rate=1) File "train_multi.py", line 74, in train ra_ed, rate=rate, shuffle=True) File "/usr/local/lib/python3.5/dist-packages/keras_textclassification/data_preprocess/text_preprocess.py", line 412, in preprocess_label_ques_to_idx label_single_index = [l2i_i2l['l2i'][ls] for ls in label_single] File "/usr/local/lib/python3.5/dist-packages/keras_textclassification/data_preprocess/text_preprocess.py", line 412, in label_single_index = [l2i_i2l['l2i'][ls] for ls in label_single] KeyError: ''

train.csv.txt

david0718 commented 3 years ago

数据里有 空|,|内容, 这种情况,去掉这样的数据过了!