kkteru / grail

Inductive relation prediction by subgraph reasoning, ICML'20
223 stars 57 forks source link

Is the data in the paper inconsistent with the data provided by the warehouse? #6

Closed zhiweihu1103 closed 3 years ago

zhiweihu1103 commented 3 years ago

I have performed statistics on all the version data provided in the warehouse, and found that there is some inconsistency with the statistical results in Table 13 of the paper? Can you give a brief explanation? Paper Sheet 13: 1

My statistics table: 2

The red and bold parts are inconsistent data. Looking forward to your reply!!!

The statistics code I used is :

    root_path = 'data/WN18RR_v2_ind'
    file_list = [root_path + '/train.txt', root_path + '/valid.txt', root_path + '/test.txt']
    relation_list = []
    entity_list = []
    count = 0
    for file_path in file_list:
        with open(file_path) as f:
            file_data = [line.split() for line in f.read().split('\n')[:-1]]
            count = count + len(file_data)
            for triplet in file_data:
                if triplet[0] not in entity_list:
                    entity_list.append(triplet[0])
                if triplet[2] not in entity_list:
                    entity_list.append(triplet[2])
                if triplet[1] not in relation_list:
                    relation_list.append(triplet[1])
    print(root_path[root_path.rfind('/')+1:])
    print(len(relation_list))
    print(len(entity_list))
    print(count)
kkteru commented 3 years ago

Hi @zhiweihu1103,

Seems like you are right! I probably calculated the statistics on an older version of the datasets. That was a mistake on my part. However, just to reiterate, the datasets posted in this repository are the ones used to generate the results in the paper. I will update the paper with the correct dataset statistics.

Thanks for pointing these out!

zhiweihu1103 commented 3 years ago

Hi @zhiweihu1103,

Seems like you are right! I probably calculated the statistics on an older version of the datasets. That was a mistake on my part. However, just to reiterate, the datasets posted in this repository are the ones used to generate the results in the paper. I will update the paper with the correct dataset statistics.

Thanks for pointing these out!

Thank you very much for your reply. Another question is, according to the README file, for the inductive prediction, we should first train on fb15k_v1 datasets. After that, we can use test_auc.py or test_ranking.py to test on the fb15k_v1_ind dataset. I dont't understand why the fb15k_v1_ind is divided into train.txt, valid.txt and test.txt? In the test_auc.py, I find only test.txt participates in the prediction of results. Why not include the train.txt and valid.txt in the fb15k_v1_ind?

kkteru commented 3 years ago

I try to explain that in this closed issue #1. Let me know if it is still not clear, happy to elaborate.

zhiweihu1103 commented 3 years ago

I try to explain that in this closed issue #1. Let me know if it is still not clear, happy to elaborate.

In the issue #1 , you explain the characteristics of the inductive dataset. There is No doubt about it. However, I don't understand point is why the _ind datasets are split into train.txt, valid.txt and test.txt. In my opinion, because of the _ind datasets don't participate in the training process, they shouldn't have to divide. Besides, I find in the test_auc.py file, only the test.txt of *_ind is used. It's confusing to me. Thx.

zhiweihu1103 commented 3 years ago

I try to explain that in this closed issue #1. Let me know if it is still not clear, happy to elaborate.

Add some explanation: point one: In the issue #1 , you mentioned "The models are trained on fb237_v1 and tested on fb237_v1_ind in the inductive setting", in other words, the fb237_v1 datasets as the train-graph and the fb237_v1_ind datasets as the ind-test-graph, is that true?

point two: If point one I mentioned is right, I am more confused why the fb237_v1_ind is divided into train.txt, valid.txt and test.txt. The fb237_v1_ind datasets is not used for model training, what is the purpose of dividing it?

point three: I found that test_auc.py only uses the test.txt in fb237_v1_ind as the test set to calculate the auc_pr indicator value. Is the code provided here wrong?

point four: When I reproduce the results of the experiment, I replace the test set in test_auc.py with all the data of fb237_v1_ind, the performance change from 83.05 to 83.12, so I care about the problem mentioned in the third point.

Looking forward to your reply!!! Thank you.

kkteru commented 3 years ago

So, the assumption that train.txt of the *_ind folder isn't used in test_auc.py is wrong. The explanation here is precisely what is happening.

If you study the code more carefully, you can see that we call generate_subgraph_datasets which in-turn calls process_files which uses train.txt to generate the graph on which GraIL makes predictions for the triplets in test.txt. You need a "support" graph on which GraIL can induce rules to make new predictions. You can think of the train.txt in the _ind dataset as sort of support set equivalent in the meta learning paradigm. GraIL needs that support graph to induce rules and make predictions on the query set (test.txt). In other words, what we are training our model to learn from the non-ind data is: to induce rules from a support (sub)graph and make predictions on test queries. We emulate the same setting in the test scenario: we provide a support set (train.txt of _ind) and measure the performance on making predictions on the query set (test.txt). Hope that clears your confusion.

Finally, you are right that valid.txt of the *_ind folders isn't used anywhere. The explanation here is pretty accurate. The splits were done in a similar fashion for all dataset folders. In principle, you can train your model on either fb237_v1 or fb237v1_ind and test the performance on the other for more statistically significant results.

zhiweihu1103 commented 3 years ago

So, the assumption that train.txt of the *_ind folder isn't used in test_auc.py is wrong. The explanation here is precisely what is happening.

If you study the code more carefully, you can see that we call generate_subgraph_datasets which in-turn calls process_files which uses train.txt to generate the graph on which GraIL makes predictions for the triplets in test.txt. You need a "support" graph on which GraIL can induce rules to make new predictions. You can think of the train.txt in the _ind dataset as sort of support set equivalent in the meta learning paradigm. GraIL needs that support graph to induce rules and make predictions on the query set (test.txt). In other words, what we are training our model to learn from the non-ind data is: to induce rules from a support (sub)graph and make predictions on test queries. We emulate the same setting in the test scenario: we provide a support set (train.txt of _ind) and measure the performance on making predictions on the query set (test.txt). Hope that clears your confusion.

Finally, you are right that valid.txt of the *_ind folders isn't used anywhere. The explanation here is pretty accurate. The splits were done in a similar fashion for all dataset folders. In principle, you can train your model on either fb237_v1 or fb237v1_ind and test the performance on the other for more statistically significant results.

Thank you very much for the detailed explanation, I have fully understood it, thanks again!!!

kkteru commented 3 years ago

You are welcome! And just a food for thought, one can think of taking couple gradient updates on the support set (train.txt in the _ind folder) to further improve performance on the query set (test.txt in the _ind folder). This takes it closer to meta-learning setup for inductive learning and would probably need the valid.txt.

Thanks for your questions!

zhiweihu1103 commented 3 years ago

You are welcome! And just a food for thought, one can think of taking couple gradient updates on the support set (train.txt in the _ind folder) to further improve performance on the query set (test.txt in the _ind folder). This takes it closer to meta-learning setup for inductive learning and would probably need the valid.txt.

Thanks for your questions!

Sounds great! Maybe this article (https://arxiv.org/pdf/2108.00954.pdf) is a bit similar to the idea you mentioned, but I haven’t had time to read it.

kkteru commented 3 years ago

Oh, nice! Yea, that is what I was talking about. They seem to do a lot more. Glad someone picked that up! Thanks for sharing.