yeeeqichen / KGQA

基于知识图谱的问答系统设计与实现,附带一个可视化的demo
96 stars 16 forks source link

能否提供一下KGE_data目录下需要的几个KG文件? #6

Closed styanXDU closed 1 year ago

styanXDU commented 1 year ago

train2id.txt entity2id.txt relation2id.txt test2id.txt valid2id.txt

yeeeqichen commented 1 year ago

你好,该文件的构造流程可以通过以下两个步骤完成:

train2id.txt: training file, the first line is the number of triples for training. Then the following lines are all in the format (e1, e2, rel) which indicates there is a relation rel between e1 and e2 . Note that train2id.txt contains ids from entitiy2id.txt and relation2id.txt instead of the names of the entities and relations. If you use your own datasets, please check the format of your training file. Files in the wrong format may cause segmentation fault.

entity2id.txt: all entities and corresponding ids, one per line. The first line is the number of entities.

relation2id.txt: all relations and corresponding ids, one per line. The first line is the number of relations.

For testing, datasets contain additional two files (totally five files):

test2id.txt: testing file, the first line is the number of triples for testing. Then the following lines are all in the format (e1, e2, rel) .

valid2id.txt: validating file, the first line is the number of triples for validating. Then the following lines are all in the format (e1, e2, rel) .

type_constrain.txt: type constraining file, the first line is the number of relations. Then the following lines are type constraints for each relation. For example, the relation with id 1200 has 4 types of head entities, which are 3123, 1034, 58 and 5733. The relation with id 1200 has 4 types of tail entities, which are 12123, 4388, 11087 and 11088. You can get this file through n-n.py in folder benchmarks/FB15K

styanXDU commented 1 year ago

kg.txt中没有划分training,test和valid,看到create_neo4j.py中只涉及到train2id.txt,只需要将所有三元组都转到train2id.txt就可以么?

styanXDU commented 1 year ago

以及如果没有划分训练集,KGE任务是如何训练的?

yeeeqichen commented 1 year ago

抱歉没有及时回复

关于数据划分的问题,可以自行对kg.txt 去重后,按照8:1 :1自行划分

关于demo 中涉及的create_neo4j.py中仅涉及train2id.txt这一问题,请自行根据需要调整,本仓库仅仅提供一个例子