zhenjia2017 / EXAQT

Code for our CIKM'21 paper "Complex Temporal Qestion Answering on Knowledge Graphs"
https://exaqt.mpi-inf.mpg.de/
31 stars 14 forks source link
gnn gst kgqa knowledge-graph temporalqa

EXAQT + TimeQuestions

Update

${\color{red}Note}$: We provided a cleaner and complete implementation here. Hope it helps you to reproduce the paper results. It also can be used for evaluating other KGQA benchmarks.

Description

This repository contains the code and data for our CIKM'21 full paper. In this paper, we present EXAQT, the first end-to-end system for answering complex temporal questions that have multiple entities and predicates, and associated temporal conditions. EXAQT answers natural language questions over KGs in two stages. The first step computes question-relevant compact subgraphs within the KG, and judiciously enhances them with pertinent temporal facts, using Group Steiner Trees and fine-tuned BERT models. The second step constructs relational graph convolutional networks (R-GCNs) from the first step’s output, and enhances the R-GCNs with time-aware entity embeddings and attention over temporal relations.

kg

Wikidata excerpt showing the relevant KG zone for the question "where did obama’s children study when he became president?" with answer Sidwell Friends School.

For more details see our paper: Complex Temporal Question Answering on Knowledge Graphs and visit our project website: https://exaqt.mpi-inf.mpg.de.

If you use this code, please cite:

@article{jia2021complex,
  title={Complex Temporal Question Answering on Knowledge Graphs},
  author={Jia, Zhen and Pramanik, Soumajit and Roy, Rishiraj Saha and Weikum, Gerhard},
  journal={arXiv preprint arXiv:2109.08935},
  year={2021}
}

Setup

The following software is required:

To install the required libraries, it is recommended to create a virtual environment:

python3 -m venv ENV_exaqt
source ENV_exaqt/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

TimeQuestions

The benchmark can be downloaded from here. TimeQuestions contains 16,181 questions. The content of each question includes:

Data

The preprocessed Wikidata facts for each question, pretrained models, all required intermediate data and our main results can be downloaded from here (unzip and put it in the root folder of the cloned github repo; total data size around 40 GB).

The data folder structure is as follows:

data
├── compactgst
    ├── train_25_25.json
    ├── dev_25_25.json
    └── test_25_25.json  
├── connectivity
    └── seedpair_question_best_connectivity_paths_score.pkl    
├── dictionaries
├── files
    ├── ques_1
    ├── ques_2
    ├── ...
    └── ques_16181
├── model
    ├── phase1_model.bin
    ├── phase2_model.bin
    └── wikipedia2vec_trained
├── result
├── temcompactgst
    ├── train_25_25_temp.json
    ├── train_25_25_temp_rank.pkl
    ├── dev_25_25_temp.json
    ├── dev_25_25_temp_rank.pkl
    ├── test_25_25_temp.json
    └── test_25_25_temp_rank.pkl
├── TimeQuestions
├── phase1_relevant_fact_selection_trainset.csv
├── phase2_temporal_fact_selection_trainset.csv
└── wikidata_property_dictionary.json

Code

The code structure is as follows:

Graph construction and answer prediction

To reproduce the result, (1) download data and pre-trained model, and save them in the root folder of the cloned github repo, and (2) make sure the path variables in the config.yml file under your own settings.

Then run the following commands:

Step 1: NERD

python get_seed_entity_elq.py (Note that the program should run under directory of BLINK-master after building ELQ environment)
python get_seed_entity_tagme.py

Step 2: Score and rank question-relevance facts

python relevant_fact_selection_model.py -d train 
python relevant_fact_selection_model.py -d dev
python relevant_fact_selection_model.py -d test

Step 3: Compute compact subgraph

python get_compact_subgraph.py -d train
python get_compact_subgraph.py -d dev
python get_compact_subgraph.py -d test

Step 4: Score and rank question-relevance temporal facts

python temporal_fact_selection_model.py -d train
python temporal_fact_selection_model.py -d dev
python temporal_fact_selection_model.py -d test

Step 5: Train answer prediction model and evaluate on test

python get_relational_graph.py -d train
python get_relational_graph.py -d dev
python get_relational_graph.py -d test
python get_dictionary.py
python get_pretrained_embedding.py
python train_eva_rgcn.py -p exaqt