Closed MelvinZang closed 5 years ago
First, let me try to answer your question. The output represents the probability for each possible relations and for each entpair. This is computed using all the sentence you have regarding the entpair. So basically you can have only one relation per entpair which is one of the downsides of this model. Which mean all the sentence talking about MT00015574#MT00000621 and MT00015574#MT00001615 are represented.
Secondly, I'd like to use your example to highlight another issue I have. Normally I would expect the sum of score for each entpair to be 1. But it's always less as the score for relation 0 is missing. I am wondering is it done on purpose or not? I mean the score of relation 0 can always be found by doing some math (1 -sum(scores)) but this seems a bit annoying.
@cgallay thanks for your reply. But here comes another question. If each output represents only one entpair, how does the model compute the accuracy? I mean it is correct_result/label for each sentence, isn't it? So where can I get every sentence's results, if the output doesn't contain them.
And if one entpair can only have one relation, I can get the relation after recognizing entities directly, why do I need this model? Just to find whether there is the certain relation in the sentence or not?
@MelvinZang It all depends on which model you choose as the authors of this GitHub repo implemented many models. But as far as I know, the main assumption they do is:
At least one sentence that mentions a pair of two entities will express their relation.
They implemented a model that can be trained using a distant supervision dataset. Basically, in that dataset, you can have some noise, a sentence that talks about the two entity but doesn't express the relation.
For example for the pair of entity (Bob and Robert) if we have those sentences: ['Robert is the father of Bob', 'Robert and Bob went shopping together', 'Robert and his son Bob live in Canada'], the first and the last sentence clearly express the relation fatherOf
but the second one is considered as noise.
Regarding the accuracy, you would compute it by looking at each entpair and simple check if you predicted correctly it's not done at the sentence level but at the entpair level.
I don't get how you can get the relation after recognizing the entities. But if I take the small example with (Bob and Robert) from above, if you only have those three sentence you could use the model to predict the relation between this two entities, and the output should be similar to the jsons you posted above, but the score
for the relation fatherOf
should be higher than the score of any other relation.
Hope this helped you. But I didn't write those model so maybe @gaotianyu1350 could confirm that it in fact what the models do.
@cgallay Thanks a lot. Maybe I had some misunderstanding for your reply before. You mean we can only have one relation per entpair in prediction while we can have many relations per entpair in training, do you?
My case is that, When training, I have Robert#Bob#fatherof
, Robert#Bob#friendsof
and so on. In prediction, after I put many sentences which may contain some of the relations, I can only get one relation for Robert#Bob
with the model.
To solve it, I just put the sentence with same entpair in the model one by one. Is there any other better way?
For the accuracy, If the relation I get appears in the trainset, for example fatherof
or friendsof
. it is correct. And if I get the relation such as capitalof
which belongs to another entpair, it is incorrect. Am i right?
Thanks a lot.
@MelvinZang Yes you are right, you can have many relations per entpair in the training set. But the model will be able to predict only one.
I guess the whole purpose of this model is its ability to deal with noise while training, distant supervision dataset are quite noisy but on the other hand, they are easier/cheaper to generate a big dataset for training this way. So if the training set you are using doesn't come from distant supervision or you are sure that there isn't some badly labelled sentence. Then I would recommend you to use a simpler model for relation extraction.
For the accuracy, I think sentences are groupby('head', 'tail', 'rel') and one prediction is computed per group so that you can actually check that the correct label is given. But when you do real prediction (as opposed to computing the accuracy) you don't know the relation in advance, so you can only group sentence by ('head', 'tail').
PS: Sorry for late reply I forget to post my answer before going on holiday ^^
@cgallay Thanks! This really helps.
By the way, Which model can I use if there isn't some badly labeled sentence?
Sorry for the late reply and thanks @cgallay fo the thorough explanations. There is a small mistake: It's actually a multi-label problem, since there may be multiple relations between one pair of entities, so multiple labels can be predicted during testing. We do not simply choose the class with the highest score and calculate accuracy, but actually we use precision-recall curve to evaluate our models.
The basic idea is enumerating thresholds, and the relations with scores higher than the threshold will be regarded as predicted. If you enumerate n thresholds, you will get n points of (precision, recall), then you can plot a precision-recall curve. Yet in the implementation we didn't actually enumerate thresholds and you can see the eval part of framework.py
for details.
As for which model to use when there are no badly labeled sentences, I haven't implemented it yet but actually it's quite simple since you only need to delete the selector part and change the way of feeding data. I'll update the code soon.
@gaotianyu1350 Thanks for the correction.
I am wondering how do you actually set the threshold. Is it set manually or does the model learn the optimal one by itself?
I am also wondering why doesn't the prediction (test_demo.py
) return the score for relation 0
?
I guess you are using multiple Sigmoid (one per relation) instead of a Sofmax function (which can only handle a single relation per entities pair) am I right?
If I am understanding correctly this would explain why you don't return the score for relation 0 as well why the scores don't sum up to 1.
@cgallay About the threshold, actually I calculate triples (entity_A, entity_B, relation, score)
and sort them by score
, and then I use those scores as the threshold. All triples with scores higher than threshold are `predicted.
About softmax and sigmoid, currently we are using softmax in the training phase but during inference, different methods are adopted. In attention and maximum models, the scores don't sum up to 1 (you can refer to the paper or code. Yet in average model, we simply use a softmax and the sum of scores is 1 (it's a little naive).
There are papers argue that using sigmoid during training would be better since it's a multi label problem. Yet the disadvantage of sigmoid is there are no constraints between classes so maybe the model will be harder to train. I haven't tried it yet but I will add the experiment in the future.
As for not reporting NA score, in multi label problem, we are actually training N binary classifiers instead of N-way classifier, so there is no so called NA score
. Yet for training efficiency we still use a softmax N-way classifier, so NA is reported during training.
@gaotianyu1350 Thanks for the precisions. I know have a better understanding ;).
I am still not sure to fully understand how the threshold is set. Let's say I want to predict the relations between two entities in a sentence. By running (test_demo.py) your model, I am able to compute the score of each relation (for each pair of entities present in the sentence, but let's say there is only one). I can now sort those relations according to the score your model gives me. But how do I know if the score is high enough (above the threshold) to indeed represent the relation as expressed in the sentence? If I have a threshold, I could simply keep the relation for which the score is higher than the threshold.
Regarding the use of sigmoid during training phase I personally think that it make more sense. Would be nice if you could experiment with it. In any case thanks for your work :)
The threshold is decided by what you're gonna do with the model. If false positive is tolerable and you want false negative samples are as few as possible, which means you need higher recall, then the threshold should be lower. On the contrary, if you want higher precision, which means you hope to narrow down the number of false positive instances, you should adopt a higher threshold value. No matter what, it is usually set manually in real world applications depending on what kind of tasks you are facing.
@cgallay @gaotianyu1350
I have met another error Recently. When I use my own word vector, which has more words than provided, ValueError: GraphDef cannot be larger than 2GB.
appears with the epoch grows when saving the model. How can I fix it? thanks a lot.
Error appears here:
path = saver.save(self.sess, os.path.join(ckpt_dir, model_name))
Word embeddings will be stored in the checkpoint file too (since they are fine-tuned). Maybe cutting the size down or fixing the embeddings and finding a way to avoid storing those parameters in the checkpoint file can solve the problem.
I run the test_demo.py file, and get result.json as follow:
[{"relation": 1, "score": 0.0, "entpair": "MT00015574#MT00001615"}, {"relation": 2, "score": 0.0, "entpair": "MT00015574#MT00001615"}, {"relation": 3, "score": 1.0377574859848236e-23, "entpair": "MT00015574#MT00001615"}, {"relation": 4, "score": 3.1533666815183214e-17, "entpair": "MT00015574#MT00001615"}, {"relation": 5, "score": 1.0, "entpair": "MT00015574#MT00001615"}, {"relation": 6, "score": 0.0, "entpair": "MT00015574#MT00001615"}, {"relation": 7, "score": 0.0, "entpair": "MT00015574#MT00001615"}, {"relation": 8, "score": 0.0, "entpair": "MT00015574#MT00001615"}, {"relation": 9, "score": 3.0276185570896847e-38, "entpair": "MT00015574#MT00001615"}, {"relation": 10, "score": 1.6583352665328436e-21, "entpair": "MT00015574#MT00001615"}, {"relation": 11, "score": 1.1470061012097688e-12, "entpair": "MT00015574#MT00001615"}, {"relation": 12, "score": 0.0, "entpair": "MT00015574#MT00001615"}, {"relation": 13, "score": 0.0, "entpair": "MT00015574#MT00001615"}, {"relation": 14, "score": 6.406954060414727e-37, "entpair": "MT00015574#MT00001615"}, {"relation": 15, "score": 1.0, "entpair": "MT00015574#MT00001615"}, {"relation": 16, "score": 0.0, "entpair": "MT00015574#MT00001615"}, {"relation": 17, "score": 4.0768455202450195e-29, "entpair": "MT00015574#MT00001615"}, {"relation": 18, "score": 1.1802251857395389e-17, "entpair": "MT00015574#MT00001615"}, {"relation": 19, "score": 2.1728631039285967e-32, "entpair": "MT00015574#MT00001615"}, {"relation": 20, "score": 1.0, "entpair": "MT00015574#MT00001615"}, {"relation": 21, "score": 1.6063005092714633e-11, "entpair": "MT00015574#MT00001615"}, {"relation": 22, "score": 0.0, "entpair": "MT00015574#MT00001615"},
{"relation": 1, "score": 0.0, "entpair": "MT00015574#MT00000621"}, {"relation": 2, "score": 0.0, "entpair": "MT00015574#MT00000621"}, {"relation": 3, "score": 1.5956034921966758e-27, "entpair": "MT00015574#MT00000621"}, {"relation": 4, "score": 3.524426855707885e-34, "entpair": "MT00015574#MT00000621"}, {"relation": 5, "score": 1.0, "entpair": "MT00015574#MT00000621"}, {"relation": 6, "score": 3.198225596035746e-18, "entpair": "MT00015574#MT00000621"}, {"relation": 7, "score": 0.0, "entpair": "MT00015574#MT00000621"}, {"relation": 8, "score": 0.0, "entpair": "MT00015574#MT00000621"}, {"relation": 9, "score": 1.8198349964571598e-31, "entpair": "MT00015574#MT00000621"}, {"relation": 10, "score": 4.6186643822339425e-30, "entpair": "MT00015574#MT00000621"}, {"relation": 11, "score": 0.0, "entpair": "MT00015574#MT00000621"}, {"relation": 12, "score": 0.0, "entpair": "MT00015574#MT00000621"}, {"relation": 13, "score": 0.0, "entpair": "MT00015574#MT00000621"}, {"relation": 14, "score": 2.5636822667148816e-37, "entpair": "MT00015574#MT00000621"}, {"relation": 15, "score": 0.9877334833145142, "entpair": "MT00015574#MT00000621"}, {"relation": 16, "score": 0.0, "entpair": "MT00015574#MT00000621"}, {"relation": 17, "score": 0.0, "entpair": "MT00015574#MT00000621"}, {"relation": 18, "score": 9.379044019787019e-26, "entpair": "MT00015574#MT00000621"}, {"relation": 19, "score": 0.7510679960250854, "entpair": "MT00015574#MT00000621"}, {"relation": 20, "score": 0.7352021932601929, "entpair": "MT00015574#MT00000621"}, {"relation": 21, "score": 3.3799745229219635e-14, "entpair": "MT00015574#MT00000621"}, {"relation": 22, "score": 0.0, "entpair": "MT00015574#MT00000621"}]
I have a dataset with 22 relations for training, and the entpairs of them are only MT00015574#MT00000621 and MT00015574#MT00001615. And then I put the test dataset with a thousand sentences, whose entities are already recognized, but no relations, and I want to find them with our method. However, I only get two results and I don't know which sentence do they represent.
I debug the project and I see the parameter 'iter_output' in framework.py. Only the first two index has value and the others are zero.
I think it relates to my entpairs, but how can I see the predicted relations for each sentence. Thanks a lot