yongzhuo / Keras-TextClassification

中文长文本分类、短句子分类、多标签分类、两句子相似度(Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short),字词句向量嵌入层(embeddings)和网络层(graph)构建基类,FastText,TextCNN,CharCNN,TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN
https://blog.csdn.net/rensihui
MIT License
1.78k stars 405 forks source link

文本相似度训练报错 #47

Closed xkungfu closed 4 years ago

xkungfu commented 4 years ago

需要测试短文本相似度。项目包下所有PY文件看了一圈,猜测好像是test/sentence_similarity目录下的train和predict,实现了这个这个功能。

不知道我猜的对不对。

然后执行: /Keras-TextClassification/test/sentence_similarity$ python train.py

时报错:


File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] No such file or directory: '/home/datad/pyroot/similartextKT/env/lib/python3.6/site-packages/keras_textclassification/data/sim_webank/train.csv'

train.csv 是指下载的资源里的sim_webank.csv么?

将sim_webank.csv改名为train.csv,放到/keras_textclassification/data/sim_webank/目录下,又出现很多其它的报错。 之前遇到 的一些问题好不不容易解决了。但是最后到这就无法再调试了。

请问短文本相似度的训练和执行,应该怎样操作,谢谢!

xkungfu commented 4 years ago

\keras_textclassification\m00_Albert\train.py,好像也是和文本相似度有关 使用时也报错:

Traceback (most recent call last):
  File "train.py", line 122, in <module>
    train(rate=1)
  File "train.py", line 110, in train
    ra_ed, rate=rate, shuffle=True)
  File "/home/datad/pyroot/similartextKT/env/lib/python3.6/site-packages/keras_textclassification/data_preprocess/text_preprocess.py", line 443, in preprocess_label_ques_to_idx
    ques_1 = data['sentence1'].tolist()
  File "/home/datad/pyroot/similartextKT/env/lib/python3.6/site-packages/pandas/core/frame.py", line 2906, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/datad/pyroot/similartextKT/env/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
    raise KeyError(key) from err
KeyError: 'sentence1'
yongzhuo commented 4 years ago

/test/sentence_similarity已修复,适配其他场景时候导致的问题