Closed elshize closed 4 years ago
I want to convert TREC 2013 queries in to PISA Query format. As told, I tried to use extract_topics to convert TREC queries. But, I am getting the following error:
terminate called after throwing an instance of 'std::runtime_error'
what(): Could not consume tag:
I have tried passing a text file containing TREC 2013 queries (format:
a text file containing TREC 2013 queries (format: :query)
Can you show one line from this file?
This type of queries should just go directly to the queries
or evaluate_queries
programs. extract_topics
is designed to be used with this type of files: https://trec.nist.gov/data/terabyte/04/04topics.701-750.txt
Can you attach your XML file (or a snippet)?
Following is the snippet of my text file: 201:raspberry pi https://trec.nist.gov/data/web/2013/web2013.topics.txt
Following is the snippet of XML file: https://trec.nist.gov/data/web/2013/trec2013-topics.xml
<webtrack2013>
<!-- Please note that topic and subtopic types (faceted/ambiguous,
inf/nav are meant as a general indicator and should not be taken
as definitive aspects of the query intent. -->
<!-- Note that the first subtopic is always identical to the description
sentence. This is to ensure that adhoc-task results are also relevant
to the subtopic task. -->
<topic number="201" type="faceted">
<query>raspberry pi</query>
<description>
What is a raspberry pi?
</description>
<subtopic number="1" type="inf">
What is a raspberry pi?
</subtopic>
<subtopic number="2" type="inf">
What software does a raspberry pi use?
</subtopic>
<subtopic number="3" type="inf">
What are hardware options for a raspberry pi?
</subtopic>
<subtopic number="4" type="nav">
How much does a basic raspberry pi cost?
</subtopic>
<subtopic number="5" type="inf">
Find info about the raspberry pi foundation.
</subtopic>
<subtopic number="6" type="nav">
Find a picture of a raspberry pi.
</subtopic>
</topic>
</webtrack2013>
I tried passing passing the text file directly to the ./queries program. but, I get a bunch of warnings:
parse_collection
program should produce a *.termlex
file. You need to pass this file to --terms
argument when calling queries
.
I did as you asked. Now, the ./queries program is segfaulting:
Ok, this doesn't seem to be related to parsing queries anymore. Can you compile it in Debug and post the stack trace from gdb
?
I compiled with flag "-g" and then ran gdb. From gdb, I ran ./queries and got the following output:
There are two more tests we can run here that might help us understand what's happening:
create_freq_index
again, and make sure that you run it with the same codec --- maybe the file is corrupted or you created a different type of index by accident; that would explain what happened.block_simdbp
, and run the query to see if it fails or not. The problem seems to originate from the Elias-Fano-specific code.If you could do these two things, that would definitely help out with finding the problem.
As requested, I ran create_freq_index again and this time I made the index type as block_optpfor. After, creating the index, I ran the queries program. It segfaulted, I have attached the backtrace from gdb:
I have repeated the entire process with index type pefopt and when I run ./queries I get a segfault. Please advice me what I should do to resolve this issue. Thank you!
Ok, one more question before I investigate further: are you on master
right now? And if not, then which commit are you on?
I think I am on master:
Can you run git status
?
Following is the output of git status:
I'm running some tests to see if I can reproduce it. I'll get back to you when I get some results.
If you need the collection that I am trying to parse and run queries on, do let me know. I will share it
@elshize @ansariyusuf Did either if you end up figuring this one out?
Nope, I never got to the bottom of this, @ansariyusuf have you ever fixed it, what happened to this? I'm sorry I didn't get back to you, I have been busy and this one just slipped my mind. If the problem still persist, let's tackled it!
I did a fresh installation of PISA on a different machine and followed the steps as told to me by @elshize and it happened to work.
In that case, let's close this one, feel free to open it if you experience similar problems in the future.
The input query format should be documented in the docs.