yao8839836 / kg-bert

KG-BERT: BERT for Knowledge Graph Completion
Apache License 2.0
679 stars 141 forks source link

How to implement run_bert_relation_prediction.py #6

Closed moh-yani closed 4 years ago

moh-yani commented 4 years ago

In your https://github.com/yao8839836/kg-bert we can perform run_bert_relation_prediction.py by running this script:

python3 run_bert_relation_prediction.py --task_name kg
--do_train
--do_eval --do_predict --data_dir ./data/FB15K --bert_model bert-base-cased --max_seq_length 25 --train_batch_size 32 --learning_rate 5e-5 --num_train_epochs 20.0 --output_dir ./output_FB15K/
--gradient_accumulation_steps 1 --eval_batch_size 512

After performing that script, we have 5 files: config.json eval_results.txt pytorch_model.bin test_results.txt vocab.txt

However, I have a question, how to predict the correct relation one given a head and a tail (/m/027rn, ?, /m/06cx9) after we have the 5 files above?.

Here I have an example, the train.tsv file contains triples below:

/m/027rn /location/country/form_of_government /m/06cx9 /m/017dcd /tv/tv_program/regular_cast./tv/regular_tv_appearance/actor /m/06v8s0 ... etc. ... eof

From the given (/m/027rn, ?, /m/06cx9), the result I want is program will output "/location/country/form_of_government" for the correct relation one.

To perform that, what should I configure to either the scripts to perform run_bert_relation_prediction.py or the data (train.tsv, dev.tsv, and test.tsv)? without doing re-train as the first time mentioned above.

Anyone could help?

Best regards,

moh-yani

yao8839836 commented 4 years ago

@moh-yani

In line 783 of run_bert_relation_prediction.py, preds will be the predicted relations given test triples, you can obtain the relation names with corresponding lines in /data/FB15K/relations.txt

moh-yani commented 4 years ago

Big thanks for your response @yao8839836, however could I know more technical explanations:

  1. What script to run "run_bert_relation_prediction.py" for performing relation prediction only? Is this same such as:

python3 run_bert_relation_prediction.py --task_name kg --do_train --do_eval --do_predict --data_dir ./data/FB15K --bert_model bert-base-cased --max_seq_length 25 --train_batch_size 32 --learning_rate 5e-5 --num_train_epochs 20.0 --output_dir ./output_FB15K/ --gradient_accumulation_steps 1 --eval_batch_size 512

or with any other modifications (maybe the args or others)?. For your information, I have trained the data by running the script above before. So, I want to predict the relation for a given (h, ?, t) afterward.

  1. I saw the line 783 written "preds = np.argmax(preds, axis=1)". It means that I can print the "preds" variable to display the predicted relation by using print(preds), can't I?

  2. How do I put the input test (h, ?, t) to "run_bert_relation_prediction.py" ?

Really hope the response so I can implement it.

Sincerely,

moh-yani

yao8839836 commented 4 years ago

@moh-yani

  1. You can run above command without --do_train.

  2. Yes, you can , but preds will be indices of labels, you need to find label names in /data/FB15K/relations.txt.

  3. you can assign it to eval_examples in line 689 just like how you did in run_bert_triple_classifier.py.

moh-yani commented 4 years ago

@yao8839836 Thank you for the explanation. For both 1st and 2nd explanation are clearly explained, however I have not understood for the 3rd explanation. Here what I have done:

  1. run script by typing (without --do_train): python3 run_bert_relation_prediction.py --task_name kg --do_eval --do_predict --data_dir ./data/FB15K --bert_model bert-base-cased --max_seq_length 25 --train_batch_size 32 --learning_rate 5e-5 --num_train_epochs 20.0 --output_dir ./output_FB15K/ --gradient_accumulation_steps 1 --eval_batch_size 512

  2. I printout the "preds"variable using print("preds:", preds) in line 783.

  3. This part is what I do not understand yet, so I keep this line 689 as origin: eval_examples = processor.get_test_examples(args.data_dir)

After run that script, the values of "preds" variabel are appeared in a list like below: preds: [ 52 326 13 ... 9 26 166]

According to your explanation before, I guess that they are indices of relations name in relation.txt, aren't they?. And I see that the values are the prediction result from given triples in "test.tsv" file where the triples are already complete (containing the h, r, t). If I wrong please correct it. So, what if given (h, ?, t)? could we? if yes, how to perform that?

I have tried to change the content of "test.tsv" file with just one row like: /m/06ms6 ? /m/0bx8pn

However it seems fail.

Could I know the explanation of that case?

yao8839836 commented 4 years ago

@moh-yani

  1. Yes, preds: [ 52 326 13 ... 9 26 166] are indices of relations name in relation.txt.

  2. You can replace ? in (/m/06ms6 ? /m/0bx8pn) with a random entity name in relations.txt. In test.tsv, the correct relation is used to evalute the KG-BERT model. In your case, preds will be the prediction results, but evaluation is not possible unless you know the relation label.

moh-yani commented 4 years ago

Oh I see. So, is it possible if I want to use your model for the case below?:

Input->Your_model->Output

Note: Input: given head entity and tail entity (given by user, and it will be random) Your_model: KG-BERT (pretrained model of run_bert_relation_prediction.py) Output: displaying the correct relation relating head entity and tail entity (Obtained by your framework KG-BERT)

For example:

  1. A user inputs: a. head entity: "/m/01qscs" b. tail entity: "/m/02x8n1n"

  2. KG-BERT receives that values (head and tail entity) as two input values and then by using that values KG-BERT will predict what the correct relation one is from "test.tsv"

  3. Finallly, KG-BERT displays "/award/award_nominee/award_nominations./award/award_nomination/award" as an answer. This result will be used for another thing next.

If it is possible to perform that, could I know how I should do?

Big thanks.

yao8839836 commented 4 years ago

@moh-yani

You can just write only one line

"/m/01qscs /award/award_nominee/award_nominations./award/award_nomination/award /m/02x8n1n"

into test.tsv (overwrite it).

The placeholder relation " /award/award_nominee/award_nominations./award/award_nomination/award " can be any other relation in "relations.txt"

Then preds will contain only one index number, then use the number to find the name in relations.txt.

moh-yani commented 4 years ago

@yao8839836

Thank you for your explanation inspiring me.

Here, I want to tell about my case.

I have read your paper especially in 6th page in Relation Prediction sub-topic. Here I quote the first paragraph of that sub-topic.

"Relation Prediction. This task predicts relations between two given entities, i.e., (h, ?, t). The procedure is similar to link prediction while we rank the candidates with the rela- tion scores s? τ. We evaluate the relation ranking using Mean Rank (MR) and Hits@1 with filtered setting."

I have learned this paragraph, and I think I can use your model like using OpenKE framework where I can predict the relation between two given entities.

In OpenKE as I know, it can perform that case. This is the example for predicting possible relation between two given entities using OpenKE:

con.predict_relation(808, 981, 10) # (h, t, k): h: head, t: tail, k:top-K

Note: 808: ID of head entity 981: ID of tail entity 10: top-K

So, with that syntax we can predict the top 10 possible relations between two given entities.

This is what I mean about my question above. Could KG-BERT also can perform this case?

Why I am interested your KG-BERT because KG-BERT offers a better result than OpenKE.

Hopefully, you can understand what I mean, and could give a good response about this. Apologize if my posting make you be inconvinient.

yao8839836 commented 4 years ago

@moh-yani

Thank you for your interests.

Now I understand what you want is not only the one relation with the max score, but also the top K relations.

In line 751 of run_bert_relation_prediction.py, argsort1 is the relation id list with descending order, you can select the first K ids as the result.

moh-yani commented 4 years ago

@yao8839836

Okay, how to put this syntax like this:

con.predict_relation(808, 981, 10) # in form (h, t, k). h: head, t: tail, k:top-K in OpenKE

note: 808: ID of head entity (let's say /m/01qscs in freebase) 981: ID of tail entity (let's say /m/02x8n1n in freebase) 10: top-K

in running "run_bert_relation_prediction.py"?

yao8839836 commented 4 years ago

@moh-yani

You you can write (/m/01qscs, r, /m/02x8n1n) into test.tsv (where r is random relation name like /award/award_nominee/award_nominations./award/award_nomination/award ), and get results like this:

line 750: _, argsort1 = torch.sort(rel_values, descending=True) line 751: argsort1 = argsort1.cpu().numpy() line 752: top_k_ids = argsort1[:10]

moh-yani commented 4 years ago

Okay, I will try this. If I had any questions I will post it to you later.

Thank you.

Sel, 19 Nov 2019 pukul 00.05 Dr. Liang Yao notifications@github.com menulis:

@moh-yani https://github.com/moh-yani

You you can write (/m/01qscs, r, /m/02x8n1n) into test.tsv (where r is random relation name like /award/award_nominee/award_nominations./award/award_nomination/award ), and get resuslts like this:

line 750: _, argsort1 = torch.sort(rel_values, descending=True) line 751: argsort1 = argsort1.cpu().numpy() line 752: top_k_ids = argsort1[:10]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yao8839836/kg-bert/issues/6?email_source=notifications&email_token=ANK4N76FOPXGLITJRH2HFW3QULDOZA5CNFSM4JOH2XBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEELFLWY#issuecomment-555111899, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANK4N7YIEPM7EN6KYJIILLTQULDOZANCNFSM4JOH2XBA .

moh-yani commented 4 years ago

@yao8839836

Your scripts: line 750: _, argsort1 = torch.sort(rel_values, descending=True) line 751: argsort1 = argsort1.cpu().numpy() line 752: top_k_ids = argsort1[:10]

is better.

With that script I can get top-10 possible Ids of relation from "relation.txt". In my testing, I put one triple in "test.tsv" with:

/m/01qscs /award/award_nominee/award_nominations./award/award_nomination/award /m/02x8n1n

and your program results:

top_k_ids number- 0 : [ 37 164 79 84 128 355 329 61 26 32] 37 0

In terms of above, 37 is the top-1 Id of relation of /award/award_nominee/award_nominations./award/award_nomination/award

It's amazing. However, It can be obtained when we put one or any complete triples in "test.tsv" file, in other word (h, r, t) are known or given in "test.tsv" file. Is it possible to be performed if I just know head and tail entity?, in this example I just know head and tail entity (/m/01qscs for head and /m/02x8n1n for tail), and then your program predicts 37 as Id of relation I want.

Hopefully, you know what I mean.

Any explanations what you give, I will appreciate it.

Big thanks

moh-yani commented 4 years ago

@yao8839836

I think this case is enough, your clear explanation helps me to understand and also to solve this case. I can predict relations by using your default pattern.

Thank you very much for your attention.

Best regards,

moh-yani