Closed lyyf2002 closed 1 year ago
What is more,the 'load_kg_dataset.py' line 135-142 and 148-153 should not run in preprocess:
if mode == "test" and inductive:
print("subsample tasks!!!!!!!!!!!!!!!!!!!")
self.test_tasks_idx = json.load(open(os.path.join(raw_data_paths, 'sample_test_tasks_idx.json')))
for r in list(self.tasks.keys()):
if r not in self.test_tasks_idx:
self.tasks[r] = []
else:
self.tasks[r] = np.array(self.tasks[r])[self.test_tasks_idx[r]].tolist()
if mode == "test" and inductive:
for idx, r in enumerate(list(self.all_rels)):
if len(self.tasks[r]) == 0:
del self.tasks[r]
print("remove empty tasks!!!!!!!!!!!!!!!!!!!")
self.all_rels = sorted(list(self.tasks.keys()))
Thanks for posting this!
In the paper, we test both transductive and inductive setting, and the flag corresponds to generating the two versions of data. See experiment section. The data posted contains both transductive and inductive data, where inductive data files have "_inductive" postfix.
And for the inductive case, we subsample the test tasks to make sure that the remaining training time background KG does not become too small as described in Appendix A.
In case I am not fully understanding the bug, I am leaving comment in the pull request and can discuss there more.
When I try to resample the inductive data of FB15K, I found that if I just run the following code in 'graph_sampler.py', the result data is not the same as the data you published in Google Driver.:
And now I find why: When set 'inductive = True', your code will use a postfix, which leads to the loaded background graph being 'graph_inductive.pt' and 'path_graph_indutive.json', but in the paper, the dev/test background graph is not inductive. I have solved this problem and will create a pull request as soon as possible.