Open eyasu11321238a opened 2 months ago
can you paste the full stack trace. Your error is mystifying as you have getDocuments in your error but not in your code. Are you sure the code is in sync with what you are executing?
Hey Craig, Thanks for your reply, I have checked the getDocument function. I was using getDocumentEntry instead of getDocument. It works now. That was the problem.
Hey guys. I encountered an issue while attempting to retrieve text documents from my indexed TREC file to compute the log-likelihood probability. Specifically, when I run the get_document_text function, I receive the following error:
AttributeError: 'org.terrier.querying.IndexRef' object has no attribute 'getDocuments'
Here are the functions I'm using:
Function to fetch the document text using the document ID from the index
def get_document_text(doc_id, index): metaindex = index.getMetaIndex() doc_id_int = metaindex.getDocumentEntry("docno", doc_id) document_text = metaindex.getItem("text", doc_id_int) return document_text
Define NTLM scorer
def ntlm_scorer(row): query_terms = row['query'].split() doc_id = row['docno'] doc_text = get_document_text(doc_id, index) # Ensure doc_id is an integer score = compute_log_likelihood_score(query_terms, doc_text, word_embeddings) return score
Initial retrieval using DirichletLM
Dirichlet = pt.BatchRetrieve(index_path, wmodel="DirichletLM", controls={'dirichletlm.mu': 1500}, verbose=True)
Chaining NTLM scoring
pipeline = Dirichlet >> pt.apply.doc_score(ntlm_scorer, verbose=True)
Request: Could you please provide suggestions on how to resolve this error? Any guidance on what might be causing this and how to properly fetch document text from the index would be greatly appreciated.
Thank you!