Closed jdongca2003 closed 6 years ago
Usually document properties during indexing are recorded using lowercase, i.e. "docno". This means that metaindex lookups use lowercase keys, e.g. see https://github.com/terrier-org/terrier-core/blob/4.2/src/trec/org/terrier/structures/outputformat/TRECDocnoOutputFormat.java#L96
What Collection implementation did you index with?
Can you tell us the value of the index.meta.key-names
property in your index's data.properties file?
Professor Crag, thank for quick response. I just indexed TREC_QA (http://www.aclweb.org/aclwiki/index.php?title=Question_Answering_(State_of_the_art)) I like to compare the traditional IR methods with deep QA method. where QID: question_id AID: response_ID
etc/terrier.properties is something like
#default controls for query expansion
querying.postprocesses.order=QueryExpansion
querying.postprocesses.controls=qe:QueryExpansion
#default controls for the web-based interface. SimpleDecorate
#is the simplest metadata decorator. For more control, see Decorate.
querying.postfilters.order=SimpleDecorate,SiteFilter,Scope
querying.postfilters.controls=decorate:SimpleDecorate,site:SiteFilter,scope:Scope
#default and allowed controls
querying.default.controls=
querying.allowed.controls=scope,qe,qemodel,start,end,site,scope
#document tags specification
#for processing the contents of
#the documents, ignoring DOCHDR
TrecDocTags.doctag=DOC
TrecDocTags.idtag=DOCNO
TrecDocTags.skip=DOCHDR
#set to true if the tags can be of various case
TrecDocTags.casesensitive=false
TrecDocTags.propertytags=DOCNO,QID,AID,LABEL
indexer.meta.forward.keys=DOCNO,QID,AID,LABEL
indexer.meta.forward.keylens=10,10,10,10
trec.querying.outputformat.docno.meta.key=DOCNO
#query tags specification
TrecQueryTags.doctag=TOP
TrecQueryTags.idtag=NUM
TrecQueryTags.process=TOP,NUM,TITLE
TrecQueryTags.skip=DESC,NARR
#stop-words file
stopwords.filename=stopword-list.txt
#the processing stages a term goes through
termpipelines=Stopwords,PorterStemmer
Hi
TrecDocTags.propertytags should not contain DOCNO I think. In fact it mentions tags not present in your example document?
indexer.meta.forward.keys should be docno not DOCNO.
I accept some of these various properties are too confusing and we are considering ways to simplify things.
Craig
Sent from my iPhone
On 9 Mar 2017, at 18:47, JIANXIONG DONG notifications@github.com<mailto:notifications@github.com> wrote:
TrecDocTags.propertytags=DOCNO,QID,AID,LABEL
In terrier 4.2 branch (line: 82) " String docno = m.getIndex().getMetaIndex().getItem("docno", docid); " If tag in document corpus is upper case "DOCNO", the above line will return empty. It is case sensitive. Although I set "ApplicationSetup.setProperty("TrecDocTags.casesensitive", "false");", it did not work.
If I change the above line to " String docno = m.getIndex().getMetaIndex().getItem("DOCNO", docid); ", everything works.
my corpus doc is something like:
Some java codes: