kaykay-dv / pocketsearch

A simple full-text search library for Python using SQLite and its FTS5 extension
https://pocketsearch.readthedocs.io/en/latest/
MIT License
1 stars 0 forks source link

Provide access to meta data on tokens #32

Closed kaykay-dv closed 1 year ago

kaykay-dv commented 1 year ago

The FTS5 engine provides access to meta data on the tokens through a separate virtual table module (fts5vocab). This should be accessible through a dedicated pocketsearch method:

# provide an iterator to the most frequent tokens
self.pocket_search.tokens()
# store all tokens in a list:
list(self.pocket_search.tokens())

The output should be a list of dictionaries:

{'token': 'the', 'num_documents': 2, 'total_count': 4}
{'token': 'is', 'num_documents': 3, 'total_count': 3}
{'token': 'fence', 'num_documents': 1, 'total_count': 2}
{'token': 'beyond', 'num_documents': 1, 'total_count': 1}
{'token': 'captial', 'num_documents': 1, 'total_count': 1}
{'token': 'england', 'num_documents': 1, 'total_count': 1}
{'token': 'europe', 'num_documents': 1, 'total_count': 1}
{'token': 'fox', 'num_documents': 1, 'total_count': 1}
{'token': 'france', 'num_documents': 1, 'total_count': 1}
{'token': 'he', 'num_documents': 1, 'total_count': 1}
{'token': 'in', 'num_documents': 1, 'total_count': 1}
{'token': 'jumped', 'num_documents': 1, 'total_count': 1}
{'token': 'now', 'num_documents': 1, 'total_count': 1}
{'token': 'of', 'num_documents': 1, 'total_count': 1}
{'token': 'over', 'num_documents': 1, 'total_count': 1}
{'token': 'paris', 'num_documents': 1, 'total_count': 1}

where

kaykay-dv commented 1 year ago

Available in 0.11.0