相似度预测方法 - Githubissues

lotushacker commented 5 years ago

直接相似度预测，请问下过程是不是先通过data里的两个csv训练执行了sim.train()， sim.eval() 后，然后注释掉sim.train()，sim.eval()步骤，只做sim = BertSim()，sim.set_mode(tf.estimator.ModeKeys.PREDICT) 就可以通过sim.predict(sentence1, sentence2)预测？谢谢了

terrifyzhao commented 5 years ago

是的，但是数据建议采用自己的数据

terrifyzhao commented 5 years ago

@terrifyzhao 请教下，我用蚂蚁金服的数据，训练完后做相似度预测，得到的结果无论是正例，还是反例得到的similarity都是 [[0.86661905 0.133381 ]] 不知道是什么原因，能帮分析下么？谢谢~

请再次确认你的数据输入是否正确

leolle commented 5 years ago

@terrifyzhao 你好，我的用蚂蚁金服训练过的模型，在predict的时候，停在这没法前进了怎么办？

2019-04-25 11:22:03.249915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8937 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:84:00.0, compute capability: 6.1)
WARNING:tensorflow:From /home/wuwei/anaconda3/envs/tfgpu/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /home/wuwei/projects/inference/bert/chinese_L-12_H-768_A-12/../tmp/result/model.ckpt-5468
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2019-04-25 11:22:05.451760: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.9.0 locally

lizhiweiena commented 5 years ago

@leolle 请问您这个问题解决了没有，我在GPU上跑，也是停在这一步。

WangFeng666 commented 5 years ago

@leolle 请问您这个问题解决了没有，我在GPU上跑，也是停在这一步。

我当时也是以为停在这一步了，后来发现这时候接着输入两个句子就可以predict了...

lizhiweiena commented 5 years ago

是的是的，我那天也无意间试出来了。尴尬😓还是谢谢您的回复。

WangFeng666 commented 5 years ago

是的是的，我那天也无意间试出来了。尴尬还是谢谢您的回复。

一样的尴尬......

Gemini77 commented 4 years ago

请问大神，为啥我不论输入多相似的句子，文本分类预测结果都是label 0 呢？ bs.predict('可以取消吗？','可取消么') INFO:tensorflow: Example INFO:tensorflow:guid: test-0 INFO:tensorflow:tokens: [CLS] 可 [SEP] 可 [SEP] INFO:tensorflow:input_ids: 101 1377 102 1377 102 INFO:tensorflow:input_mask: 1 1 1 1 1 INFO:tensorflow:segment_ids: 0 0 0 1 1 INFO:tensorflow:label: 0 (id = 0) Out[36]: array([[0.25568685, 0.7443132 ]], dtype=float32)

Gemini77 commented 4 years ago

@terrifyzhao

njuljw76 commented 4 years ago

请问大神，为啥我不论输入多相似的句子，文本分类预测结果都是label 0 呢？ bs.predict('可以取消吗？','可取消么') INFO:tensorflow: Example INFO:tensorflow:guid: test-0 INFO:tensorflow:tokens: [CLS] 可 [SEP] 可 [SEP] INFO:tensorflow:input_ids: 101 1377 102 1377 102 INFO:tensorflow:input_mask: 1 1 1 1 1 INFO:tensorflow:segment_ids: 0 0 0 1 1 INFO:tensorflow🏷️ 0 (id = 0) Out[36]: array([[0.25568685, 0.7443132 ]], dtype=float32)

以你的例子来看，这个预测出的是相似的，要看array([[0.25568685, 0.7443132 ]]后面的数字，它是判断为正样本的概率，0.7>0.2，所以是判断为正样本了

terrifyzhao / bert-utils

相似度预测方法 #9