uclnlp / cqd

Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs
MIT License
95 stars 11 forks source link

Replicating the query explanations from Section 5.5/Figure 3 #1

Closed sbonner0 closed 3 years ago

sbonner0 commented 3 years ago

Hey thanks for releasing the code - is it possible to replicate your results from section 5.5 on explaining the answers to the queries?

pminervini commented 3 years ago

Hi! Sorry for the super late reply -- @dfdazac has some code for that I think! Daniel, can you point @sboonner0 to it?

More recently we reimplemented CQD in @hyren et al.'s https://github.com/snap-stanford/KGReasoning/: https://github.com/snap-stanford/KGReasoning/pull/9 -- for analysing the intermediate variable assignments it should be enough to check their values in discrete.py: https://github.com/pminervini/KGReasoning/blob/main/cqd/discrete.py#L36

sbonner0 commented 3 years ago

Hi @pminervini - no problem, thanks so much for getting back to me. Thanks for the pointers to the KGReasoning repo, i'll check it out.

I actually have two more questions if that is ok. 1) Do you have any plans to demonstrate how the approach could be run of a different datasets, including data preprocessing etc. 2) In trying to understand the beam search approach you discuss in the paper, am I right in thinking that there are no trainable parameters used in this?

pminervini commented 3 years ago

Hi @sbonner0, @dfdazac is looking into merging the code for explanations in this repo!

About your questions:

1) At the moment we are using the datasets available in https://github.com/snap-stanford/KGReasoning/ -- we haven't looked into generating our own evaluation data yet, but that's definitely on the TODO list (e.g. it could be interesting to analyse the behaviour of the model on way more complex queries).

2) Yes! It uses an off-the-shelf ComplEx-N3 model trained on 1p queries, and there are no additional trainable parameters. It could be fun to explore models with additional trainable parameters trained on complex queries (rather than just 1p/atomic ones), e.g. for learning parametric t-norms -- let us know if you plan to look into this :)

miselico commented 3 years ago

@pminervini @sbonner0 I am currently working on the query generation for more complex queries and other datasets. It is working for our specific setting, but I want to generalize it a bit and separate it from the rest of the codebase.

sbonner0 commented 3 years ago

Thanks @pminervini @miselico I look forward to hopefully being able to run your approach on other benchmark datasets such as WN18RR as well as getting some explanations.

@pminervini just to dig a little more into the second point - so given the ComplEx embeddings are pre-trained, there is no real concept of training data for the beam search based approach - or am I missing something entirely?

pminervini commented 3 years ago

@pminervini just to dig a little more into the second point - so given the ComplEx embeddings are pre-trained, there is no real concept of training data for the beam search based approach

That's correct!

dfdazac commented 3 years ago

@sbonner0 thanks for opening this issue initially. I have merged some code in main and an explanation in the README regarding how to generate the explanations. The idea essentially consists of mapping the IDs of the top-k selected entities at each step to their actual name string, and then formatting everything nicely.