haitian-sun / GraftNet

BSD 2-Clause "Simplified" License
268 stars 56 forks source link

Questions on preprocessing codes #6

Closed shmsw25 closed 5 years ago

shmsw25 commented 5 years ago

Hi, thanks a lot for sharing your codes.

I am trying to reproduce preprocessed data by following the instruction on preprocessing folder, but it looks like it does not create the same file as released preprocessed data.

  1. Do you plan to release the code which produces the same file as released preprocessed data? (same formatting, etc)?
  2. The code contains subgraph retrieval but not document retrieval. Do you plan to release the document retrieval part?
  3. I noticed that files in freebase_2hops/stagg.neighborhoods play a significant role as they contain entities to run PPR, but they are downloaded and there is no code generating those files. Could you provide codes generating those files, or share details of how you get them?

I really appreciate your help!

haitian-sun commented 5 years ago

Thanks for your questions.

For document retrieval, we run DrQA to retrieve the top 5 wikipedia pages. We split the pages into sentences and run Lucene to retrieve the top 50 sentences. Then we ran TagME (https://tagme.d4science.org/tagme/) to link entities.

The stage.neighborhoods is described at http://curtis.ml.cmu.edu/kbir/.

shmsw25 commented 5 years ago

Hi @OceanskySun, thanks for your answer!

I have one more question. I was hoping to pull over the textual form (surface form) of the model prediction. Is there a file with mappings between entity id and textual form? I see textual forms of the groundtruth answers in your released data, but it seems like there's no file with mappings of all entities.