Closed cmavro closed 2 years ago
Hi Costas,
Thanks for double-checking! The way we get a KG subgraph for each question is:
Hope this information is helpful!
Thanks
Thanks for the clarification!
For the question that we do not know answers(e.g. GrailQA test set), how to sample the subgraph?
And where can I find the sample code?
Thx
Hi, since the GrailQA has hidden test set, our test set is the split of dev set.
For WebQSP, as the original dataset does not have dev set, we split the original train set into in-house train/dev sets (90%/10%), following prior practice (e.g. Ren et al. (2021)). Similarly, for CompWebQ, as the test set is not publicly avail- able, we split the original dev set into in-house dev/test sets (20%/80%). For GrailQA, we split the original dev set into in-house dev/test sets (5%/95%).
Information above will be added in the next version of arxiv we are preparing. And sorry for the confusion in the last version. Hope this information helpful!
Thanks
Thanks for your replay, it is helpful
I also wonder how to sample the subgraph (the 60 triples), so I can quickly test a question in dataset. colab has not provide knowledge based question answering usage.
Besides, where can I find the jsonl file mentioned by @cmavro
thanks a lot!
Thanks! I think I see where the problem lies. Sorry for confusing and allow me to explain.
TLDR;
1, Check out the data we processed for you in here to test your question, just search for your question and combine the text sequence and structured knowledge sequence together as an input that in colab will be fine! 2, The jsonl lies in here, we use the huggingface to download and prepare it in this segment of code.
Longer version of more details😃 The logic of the UnifiedSKG framework is: 1) Downloading and read-in the raw data from their source by scripts in ./tasks file. 2) Convert them into the seq2seq version by the scripts in ./seq2seq_construction file. 3) Run experiments from the models in ./models file through train.py and control the procedure by configs in ./configure file and args inputed through command line. We trained the weight through the UnifiedSKG framework and upload the trained weight for usage.
However, considering that not everyone want to go through all this framework and we want to attract readers by easy usage, we provide a usage demo in colab which actually simplifies the data loading procedure, allowing user to input what they like instead of from some dataset for play. So it is actually different in coding logic(a.k.a One is for developers and one is for audiences).
Hope this information is helpful!
Thanks
Hello, I was wondering about the same questions that people were asking here and thank you so much for the detailed answers! I have a follow-up question - In the subgraph extraction step, it seems that the gold SPARQL query is needed even during test time, which is a bit unusual. In this way, is it directly comparable to other methods?
Hi, very exciting work!
I have a question on how you create the question-specific subgraphs when using Knowledge Graphs as input (i.e., ComplexWebQ). By navigating in compwebq/test.jsonl, I see that the maximum number of triplets used over all questions is 61 and that at least one answer lies within the subgraphs in 2725/2816 (96.8%) test questions.
Do you use specific mechanisms to prune irrelevant facts and how you make sure to contain the answers?
Thanks a lot!