Closed eyasu11321238a closed 7 hours ago
Hi @eyasu11321238a
I tried the code above and it seems to work fine for me on a basic example (no error with getDocumentFrequency
):
!pip install -U python-terrier
import pyterrier as pt
index_ref = pt.terrier.IterDictIndexer('./test.terrier').index([
{'docno': '1', 'text': 'hello world hello world'},
{'docno': '2', 'text': 'hello'},
])
index = pt.IndexFactory.of(index_ref)
def get_term_collection_freq(term, index):
lexicon = index.getLexicon()
lexicon_entry = lexicon.getLexiconEntry(term)
return lexicon_entry.getFrequency() if lexicon_entry else 0
def get_document_frequency(term, index):
lexicon = index.getLexicon()
lexicon_entry = lexicon.getLexiconEntry(term)
return lexicon_entry.getDocumentFrequency() if lexicon_entry else 0
get_term_collection_freq("hello", index)
3
get_term_collection_freq("world", index)
2
get_term_collection_freq("oov", index)
0
get_document_frequency("hello", index)
2
get_document_frequency("world", index)
1
get_document_frequency("oov", index)
0
Can you try this example see if it works on your machine?
@seanmacavaney, Thank you for your response. Both functions work perfectly now after upgrading with !pip install -U python-terrier. I appreciate your help!
I'm not sure what the problem would have been in an older version of PyTerrier, as this functionality is quite old. In future, if you can post the error message, it helps us understand the problem :-)
@eyasu11321238a Can you share the version that you had installed previously? It might help us track the issue down.
@seanmacavaney The version was 0.10.0 and now I upgraded it to 0.11.0
Thanks!
Hello,
I’m currently using PyTerrier and have implemented two functions to retrieve term-related statistics from an index. Below are the functions I’m working with:
The first function get_term_collection_freq() works as expected, providing the collection frequency of a term. However, the second function get_document_frequency() does not seem to work because the lexicon_entry object does not have a method getDocumentFrequency().
Could you please confirm if there is an equivalent method for getting the document frequency of a term from the lexicon or suggest an alternative way to retrieve it?
Thank you for your help!