haitian-sun / GraftNet

BSD 2-Clause "Simplified" License
268 stars 56 forks source link

questions about document.json file #28

Closed vongyx closed 2 years ago

vongyx commented 3 years ago

Hello Dr.Sun In your paper "Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text", you point out "For the sentence-retrieval step, we found it beneficial to include the title of the article as an additional field in the Lucene index. As most sentences in an article talk about the title entity, this helps in retrieving relevant sentences that do not explicitly mention the entity in the question." I have some questions about this.

  1. I can't find the title field in each document of the ducuments.json in webqsp, but the document in wikimovie has a title field. I wonder if you give the title in other forms or you didn't give the title field for dataset webqsp.
  2. I find some document put the title in front of its text field. For example, {"text": "Natalie Portman Natalie Portman (born Neta-Lee Hershlag, ; June 9, 1981) is an actress, producer and director with dual American and Israeli citizenship."}, "documentId": 1000}. Natalie Portman is followed by "Natalie Portman (born Neta-Lee Hershlag,..." using two spaces, but these sentences only account for 5%. I wonder whether you use the title only for the first sentence in the entity description. Hopes for your reply :)
haitian-sun commented 3 years ago

Hi,

Thanks for your email. I can’t remember exactly what we did with the title. You may be able to match the passages back to Wikipedia to find their titles if you need them to run your experiments.

Thanks, Haitian

On May 26, 2021, at 8:10 AM, vongyx @.***> wrote:

Hello Dr.Sun In your paper "Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text", you point out "For the sentence-retrieval step, we found it beneficial to include the title of the article as an additional field in the Lucene index. As most sentences in an article talk about the title entity, this helps in retrieving relevant sentences that do not explicitly mention the entity in the question." I have some questions about this.

I can't find the title field in each document of the ducuments.json in webqsp, but the document in wikimovie has a title field. I wonder if you give the title in other forms or you didn't give the title field for dataset webqsp. I find some document put the title in front of its text field. For example, {"text": "Natalie Portman Natalie Portman (born Neta-Lee Hershlag, ; June 9, 1981) is an actress, producer and director with dual American and Israeli citizenship."}, "documentId": 1000}. Natalie Portman is followed by "Natalie Portman (born Neta-Lee Hershlag,..." using two spaces, but these sentences only account for 5%. I wonder whether you use the title only for the first sentence in the entity description. Hopes for your reply :) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OceanskySun/GraftNet/issues/28, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE5XL7YV32EAYAX62WGXPDTPTQLNANCNFSM45R4PPRA.

vongyx commented 3 years ago

Hi @OceanskySun, thanks for your reply! I have some more questions for you.

  1. I need the title of these passages, but some passages can't be matched back to Wikipedia. I want to know which version of Wikipedia you use.
  2. Could you share the code that obtains the top5 Wikipedia articles and retrieve the top50 passages related to the question? I really appreciate your help!