StarE encodes hyper-relational fact by first passing Qualifier pairs through a composition function and then summed and transformed by . The resulting vector is then merged via , and with the relation and object vector, respectively. Finally, node Q937 aggregates messages from this and other hyper-relational edges. Please refer to the paper for details.
Create a new conda environment and execute setup.sh
.
Alternatively
pip install -r requirements.txt
The dataset can be found in data/clean/wd50k
.
Its derivatives can be found there as well:
wd50k_33
- approx 33% of statements have qualifierswd50k_66
- approx 66% of statements have qualifierswd50k_100
- 100% of statements have qualifiersMore information available in dataset README
Specified as MODEL_NAME
in the running script
stare_transformer
- main model StarE (H) + Transformer (H) [default]stare_stats_baseline
- baseline model Transformer (H)stare_trans_baseline
- baseline model Transformer (T)Specified as DATASET
in the running script
jf17k
wikipeople
wd50k
[default]wd50k_33
wd50k_66
wd50k_100
It is advised to run experiments on a GPU otherwise training might take long.
Use DEVICE cuda
to turn on GPU support, default is cpu
.
Don't forget to specify CUDA_VISIBLE_DEVICES
before python
if you use cuda
Currently tested on cuda==12.1
Three parameters control triple/hyper-relational nature and max fact length:
STATEMENT_LEN
: -1
for hyper-relational [default], 3
for triplesMAX_QPAIRS
: max fact length (3+2quals), e.g., 15
denotes a fact with 5 qualifiers `3+25=15.
15is default for
wd50kdatasets and
jf17k, set
7for wikipeople, set
3for triples (in combination with
STATEMENT_LEN 3`) SAMPLER_W_QUALIFIERS
: True
for hyper-relational models [default], False
for triple-based models only The following scripts will train StarE (H) + Transformer (H) for 400 epochs and evaluate on the test set:
python run.py DATASET wd50k
CUDA_VISIBLE_DEVICES=0 python run.py DEVICE cuda DATASET wd50k
DATASET
with the available above names
python run.py DATASET wd50k_33
python run.py DATASET jf17k CLEANED_DATASET False
python run.py DATASET wikipeople CLEANED_DATASET False MAX_QPAIRS 7 EPOCHS 500
Triple-based models can be started with this basic set of params:
python run.py DATASET wd50k STATEMENT_LEN 3 MAX_QPAIRS 3 SAMPLER_W_QUALIFIERS False
More hyperparams are available in the CONFIG
dictionary in the run.py
.
If you want to adjust StarE encoder params prepend GCN_
to the params in the STAREARGS
dict, e.g.,
python run.py DATASET wd50k GCN_GCN_DIM 80 GCN_QUAL_AGGREGATE concat
will construct StarE with hidden dim of 80 and concat as gamma
function from the paper.
It's there out of the box! Create an account on WANDB Then, make sure you install the latest version of the package
pip install wandb
Locate your API_KEY in the user settings and activate it:
wandb login <api_key>
Then just use the CLI argument WANDB True
, it will:
wikidata-embeddings
project in your active team@inproceedings{StarE,
title={Message Passing for Hyper-Relational Knowledge Graphs},
author={Galkin, Mikhail and Trivedi, Priyansh and Maheshwari, Gaurav and Usbeck, Ricardo and Lehmann, Jens},
booktitle={EMNLP},
year={2020}
}
For any further questions, please contact: mikhail.galkin@iais.fraunhofer.de