pytries / datrie

Fast, efficiently stored Trie for Python. Uses libdatrie.
http://pypi.python.org/pypi/datrie/
GNU Lesser General Public License v2.1
530 stars 88 forks source link

Create Nodes on Spaces #37

Closed brunoalano closed 7 years ago

brunoalano commented 7 years ago

I've a problem, for example:

trie = datrie.Trie(string.ascii_lowercase + ' ')
keys = ['something nice', 'something cool']
for i,k in enumerate(keys):
  trie[k] = i

trie.prefixes('something nice') # returns ['something nice']

But the trie.prefixes, should not return ['something', 'something nice'] in this case? Should I split the Sentence by spaces, and create a record for each splitted word?

superbobry commented 7 years ago

datrie does not do any special handling of whitespace characters. You specify ' ' as part of the alphabet therefore 'something nice' is a valid word.

To get the desired behaviour remove ' ' from the alphabet and split each sentence manually before inserting it into the trie.

brunoalano commented 7 years ago

@superbobry Thanks, no problem.

This is how I done it:

for tk in [ ' '.join(title.split()[0:i]) for i in range(1, len(title.split())) ]:
  if tk not in trie: trie[tk] = 0