This repository contains the official implementation for our ICLR 2021 (Oral, Outstanding Paper Award) paper, Complex Query Answering with Neural Link Predictors:
@inproceedings{
arakelyan2021complex,
title={Complex Query Answering with Neural Link Predictors},
author={Erik Arakelyan and Daniel Daza and Pasquale Minervini and Michael Cochez},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=Mos9F9kDwkz}
}
In this work we present CQD, a method that reuses a pretrained link predictor to answer complex queries, by scoring atom predicates independently and aggregating the scores via t-norms and t-conorms.
Our code is based on an implementation of ComplEx-N3 available here.
Please follow the instructions next to reproduce the results in our experiments.
We recommend creating a new environment:
% conda create --name cqd python=3.8 && conda activate cqd
% pip install -r requirements.txt
We use 3 knowledge graphs: FB15k, FB15k-237, and NELL.
From the root of the repository, download and extract the files to obtain the folder data
, containing the sets of triples and queries for each graph.
% wget http://data.neuralnoise.com/cqd-data.tgz
% tar xvf cqd-data.tgz
Then you need neural link prediction models -- one for each of the datasets. Our pre-trained neural link prediction models are available here:
% wget http://data.neuralnoise.com/cqd-models.tgz
% tar xvf cqd-models.tgz
To obtain entity and relation embeddings, we use ComplEx. Use the next commands to train the embeddings for each dataset.
% python -m kbc.learn data/FB15k --rank 1000 --reg 0.01 --max_epochs 100 --batch_size 100
% python -m kbc.learn data/FB15k-237 --rank 1000 --reg 0.05 --max_epochs 100 --batch_size 1000
% python -m kbc.learn data/NELL --rank 1000 --reg 0.05 --max_epochs 100 --batch_size 1000
Once training is done, the models will be saved in the models
directory.
CQD can answer complex queries via continuous (CQD-CO) or combinatorial optimisation (CQD-Beam).
Use the kbc.cqd_beam
script to answer queries, providing the path to the dataset, and the saved link predictor trained in the previous step. For example,
% python -m kbc.cqd_beam --model_path models/[model_filename].pt
Example:
% PYTHONPATH=. python3 kbc/cqd_beam.py \
--model_path models/FB15k-model-rank-1000-epoch-100-*.pt \
--dataset FB15K --mode test --t_norm product --candidates 64 \
--scores_normalize 0 data/FB15k
models/FB15k-model-rank-1000-epoch-100-1602520745.pt FB15k product 64
ComplEx(
(embeddings): ModuleList(
(0): Embedding(14951, 2000, sparse=True)
(1): Embedding(2690, 2000, sparse=True)
)
)
[..]
This will save a series of JSON fils with results, e.g.
% cat "topk_d=FB15k_t=product_e=2_2_rank=1000_k=64_sn=0.json"
{
"MRRm_new": 0.7542805715523118,
"MRm_new": 50.71081983144581,
"HITS@1m_new": 0.6896709378392843,
"HITS@3m_new": 0.7955001359095913,
"HITS@10m_new": 0.8676865172456019
}
Use the kbc.cqd_co
script to answer queries, providing the path to the dataset, and the saved link predictor trained in the previous step. For example,
% python -m kbc.cqd_co data/FB15k --model_path models/[model_filename].pt --chain_type 1_2
All results from the paper can be produced as follows:
% cd results/topk
% ../topk-parse.py *.json | grep rank=1000
d=FB15K rank=1000 & 0.779 & 0.584 & 0.796 & 0.837 & 0.377 & 0.658 & 0.839 & 0.355
d=FB237 rank=1000 & 0.279 & 0.219 & 0.352 & 0.457 & 0.129 & 0.249 & 0.284 & 0.128
d=NELL rank=1000 & 0.343 & 0.297 & 0.410 & 0.529 & 0.168 & 0.283 & 0.536 & 0.157
% cd ../cont
% ../cont-parse.py *.json | grep rank=1000
d=FB15k rank=1000 & 0.454 & 0.191 & 0.796 & 0.837 & 0.336 & 0.513 & 0.816 & 0.319
d=FB15k-237 rank=1000 & 0.213 & 0.131 & 0.352 & 0.457 & 0.146 & 0.222 & 0.281 & 0.132
d=NELL rank=1000 & 0.265 & 0.220 & 0.410 & 0.529 & 0.196 & 0.302 & 0.531 & 0.194
When using CQD-Beam for query answering, we can inspect intermediate decisions.
We provide an example implementation for the case of 2p queries over FB15k-237,
that generates a log file. To generate this log, add the --explain
flag when running the
cqd_beam
script. The file will be saved as explain.log
.
Note: for readability, this requires an extra file mapping FB15k-237 entity identifiers
to their original names. Download the file from this link
to the data/FB15k-237
path and untar it.