seperman / fast-autocomplete

Fast Autocomplete: When Elastcsearch suggestions are not fast and flexible enough
MIT License
272 stars 40 forks source link

Autocomplete on words with punctuation points bug #20

Closed rallytime closed 3 years ago

rallytime commented 3 years ago

Hello! Firstly, thank you so much for this great auotcomplete library.

Secondly, how should punctuation be handled in a word list? I looked through the issues and the docs and didn't see anything referencing the behavior of . in words for this library.

I have a list of cities that I want to use as the words to auto complete and some of the cities have a . in them. Strangely, some short queries will match as well as the full name, but everything in between returns an empty list. This differs from the behavior in words without punctuation.

Here's an example of what I am seeing. I have the latest version of fast_autocomplete installed and I've paired down the words to search for easy readability:

Basic setup:

Python 3.8.6 (default, Nov 20 2020, 18:29:40)
[Clang 12.0.0 (clang-1200.0.32.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from fast_autocomplete import AutoComplete
>>> cities = {'st. george': {}, 'some other town': {}, 'st. albans': {}}
>>> autocomplete = AutoComplete(words=cities)

Works as expected with short search:

>>> autocomplete.search('st. ')
[['st. george'], ['st. albans']]
>>> autocomplete.search('st. a')
>>> autocomplete.search('st. g')
[['st. george'], ['st. albans']]

Also works as expected with exact matches:

>>> autocomplete.search('st. george')
[['st. george']]
>>> autocomplete.search('st. albans')
[['st. albans']]

What doesn't work:

>>> autocomplete.search('st. ge')
[]
>>> autocomplete.search('st. al')
[]
>>> autocomplete.search('st. alb')
[]
>>> autocomplete.search('st. alba')
[]
etc....

This behavior differs from the word that doesn't have any punctuation in it:

>>> autocomplete.search('some')
[['some other town']]
>>> autocomplete.search('some ot')
[['some other town']]
>>> autocomplete.search('some other')
[['some other town']]
>>> autocomplete.search('some other t')
[['some other town']]
>>> autocomplete.search('some other to')
[['some other town']]

Is this expected behavior or a bug? Is there a way I can work around this?

Thanks very much in advance and I am happy help out here as necessary.

seperman commented 3 years ago

Hello, Thanks. Off the top of my head, you need to pass punctuation as valid characters. The example in the readme is for unicode but your problem falls under the same category.

Please see here:

https://github.com/seperman/fast-autocomplete#unicode

Sep Dehpour

On Feb 23, 2021, at 1:40 PM, Nicole Thomas notifications@github.com wrote:

 Hello! Firstly, thank you so much for this great auotcomplete library.

Secondly, how should punctuation be handled in a word list? I looked through the issues and the docs and didn't see anything referencing the behavior of . in words for this library.

I have a list of cities that I want to use as the words to auto complete and some of the cities have a . in them. Strangely, some short queries will match as well as the full name, but everything in between returns an empty list. This differs from the behavior in words without punctuation.

Here's an example of what I am seeing. I have the latest version of fast_autocomplete installed and I've paired down the words to search for easy readability:

Basic setup:

Python 3.8.6 (default, Nov 20 2020, 18:29:40) [Clang 12.0.0 (clang-1200.0.32.27)] on darwin Type "help", "copyright", "credits" or "license" for more information.

from fast_autocomplete import AutoComplete cities = {'st. george': {}, 'some other town': {}, 'st. albans': {}} autocomplete = AutoComplete(words=cities) Works as expected with short search:

autocomplete.search('st. ') [['st. george'], ['st. albans']] autocomplete.search('st. a') autocomplete.search('st. g') [['st. george'], ['st. albans']] Also works as expected with exact matches:

autocomplete.search('st. george') [['st. george']] autocomplete.search('st. albans') [['st. albans']] What doesn't work:

autocomplete.search('st. ge') [] autocomplete.search('st. al') [] autocomplete.search('st. alb') [] autocomplete.search('st. alba') [] etc.... This behavior differs from the word that doesn't have any punctuation in it:

autocomplete.search('some') [['some other town']] autocomplete.search('some ot') [['some other town']] autocomplete.search('some other') [['some other town']] autocomplete.search('some other t') [['some other town']] autocomplete.search('some other to') [['some other town']] Is this expected behavior or a bug? Is there a way I can work around this?

Thanks very much in advance and I am happy help out here as necessary.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

rallytime commented 3 years ago

@seperman Ah, perfect! Thank you. That works perfectly and makes sense. I might add a quick PR to note in those docs for anyone else looking for this answer.

I did try that before, but it was on a more complicated word list that I have put together and I think there is a bug in my code so the valid_chars_for_string didn't work for me. When I check it with simple example I posted above it works very nicely.

Thank you for the quick reply!

rallytime commented 3 years ago

If anyone else finds this issue, I needed to bump the version up to 0.7.0 from 0.6.0 to get this to work correctly and you need to pass the whole list of valid chars that you want, not just the punctuation marks. For my case, I did this:

valid_chars = "."
valid_chars += string.ascii_lower
ac = AutoComplete(words=words, valid_chars_for_string=valid_chars)