Closed zhu762 closed 2 years ago
In addition, I added the following two lines of code to the __index_exmaple method of your Example class. I think that if these two lines of code are missing, reverse_NL_index and reverse_PL_index seem to be always empty, and will not work in the subsequent judgments. May I ask Will my modification affect the evaluation result? Finally, thank you for paying attention to my problem in your busy schedule!
I think there are multiple causes:
These lines should be added but should have no impact on the evaluation. The reverse_index is used to recover the real-id for the prediction instances when their numeric ids are provided. The model uses numeric ids internally, which are included in the NL_index and PL_index.
Thank you for answering my question. I will look for a data set containing python code to try again.
I have one more question. I try to run the code you provided and I use the siamese model you provided. But the results of the operation are quite different from the results provided in your literature. The f1 score of the flask dataset is 0.76, and the f1 score of the pgcli dataset is 0.851, which seems to be the opposite of the data you provided. However, the f1 score of the keras dataset is 0.964, which is close to the data you provided. Is there a problem with this result?
Attached are the evaluation results of the three data sets and the parameter settings of my running code. results.zip
I checked the origin output file to make sure I did not fill the wrong columns :) I am not what is the exact cause, but the model I uploaded is different from the ones I used in the paper. I guess the randmness and model selection (e.g. which checkpoints to use) has an impact on the down stream task. It is an interesting observation though.
Thank you for your patience, I wish you a happy life.
I formatted the data in the eTOUR data set as the csv example in the second step, and then used the siamese model you provided for the second step of training, but the evaluation result showed that the f1 score was only 0.11, which was not even as good as VSM. I think it may be caused by the small amount of eTOUR data, so I took out 100 pieces of data in the keras-team/keras data set you provided for training and evaluation, and the f1 score even reached 1.0. So I ruled out this possibility. The following figure shows the parameters I used during training. I would like to ask you what could be the reason for the poor evaluation results? .