Closed jiSilverH closed 1 year ago
What is the environment you used that creates this error?
I did not set any data schema specifically. After downloading Wikitext and Newsroom, I indexed the data as is.
This tutorial instructs how to install solar and index data, which was generally what I did (only changing some variable names).
This thread may be helpful: https://stackoverflow.com/questions/22330713/cannot-post-to-a-core-in-solr-using-simpleposttool
Hello. Thank you for your quick response.
I was able to understand and follow all the instructions in the tutorial. However, I encountered an issue when attempting to apply them to the knowledgecollection. Unfortunately, it didn't work as expected.
I have a question regarding the file extension used for the wikitext files. It appears that Solr automatically detects certain file extensions, but the wikitext files I downloaded have the extension '.tokens'. I'm curious to know if you converted the '.tokens' files to either CSV or JSON format before indexing them into Solr.
Thank you.
Hi,
Yes, before indexing I converted the .tokens files to json format, which is a set of entries. To structure each entry, overall I looped through the .tokens file, when there is a line starting/ending with " = ", I set the content in the line without '=' symbol as key and all content beneath this line (e.g., paragraph and lines with multiple "=") as value. The key is like the 'header' or 'title' in each wiki page, and the value is like the knowledge related to this header. When encountering a line starting/ending with " = ", it means a new key/header should be created. Is this clear?
Sorry I forgot to mention this in Readme. Thank you for asking.
Hello.
Indexing wikitext worked after changing the files into json format. Thanks a lot.
I have one final question. Could you please explain how did you get "repetition rate" and "novelty" score? Did you implement the code based on the formulas suggested in each paper?
Thank you.
Hi,
For novelty yes, I pushed my implementation here.
For the repetition rate, as it is more complicated I instead asked the authors if they could share the code.
Thanks a lot :D
🙂
Hello,
I'm currently facing difficulties indexing Wikitext with Solr and would appreciate some guidance on setting up the schema.
I got
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url:
when I executebin/post -c knowledgecollection -p 8989 datasets/wikitext/*
I think the problem is in the data schema.
How did you set the schema of wikitext.
Thank you