Closed saedr closed 4 years ago
Hi, the preprocessing code originally added only included the steps for constructing the sub-graph from the KB. To add text to the sub-graph a lucene pipeline needs to be run.
We have added the code for preprocessing the WikiMovies dataset in the wikimovie_preprocessing
directory. The lucene pipeline is included here. You can follow a similar procedure to run it on the WebQuestionsSP dataset as well.
Hi,
I tried to use your preprocessing scripts to generate required files for training, however the final product of the pre-processing steps is different from what it is available through the download link. In fact the
data_loader.py
cannot handle the output of the preprocessing steps. One clear difference is that the downloaded preprocessed files contain a field calledpassages
while the output of preprocessing doesn't have that field instead it includes another field calledpagerank_score
.Could you please elaborate on these fields?
Thank you very much!