seperman / fast-autocomplete

Fast Autocomplete: When Elastcsearch suggestions are not fast and flexible enough
MIT License
267 stars 40 forks source link

Bug when searching for alphanumeric characters or words with dashes #27

Closed AndreaSottana closed 2 years ago

AndreaSottana commented 3 years ago

Hello again, This is a great repository, and as I'm starting to use it more widely, I've noticed some further bugs. If a word is made up by both letters AND numbers, the autocomplete stops working after we hit our first different (letter vs. number) character. See example below with item "012345abc678".

Attempt 1: searching only initial numbers

from fast_autocomplete import AutoComplete
words = {'012345abc678': {}}
autocomplete = AutoComplete(words=words)
print(autocomplete.search(word='012345'))

Result is [['012345abc678']] as expected.

Attempt 2: searching numbers and first letter

print(autocomplete.search(word='012345a'))

Result is still [['012345abc678']] as expected.

Attempt 3: going further into the item and including the first two letters

print(autocomplete.search(word='012345ab'))

Result is []. Everything has disappeared. Moreover, if I instead put spaces into the original word and do the search without spaces, then it works fine

from fast_autocomplete import AutoComplete
words = {'012345 abc 678': {}}
autocomplete = AutoComplete(words=words)
print(autocomplete.search(word='012345ab'))

returns [['012345 abc 678']]. Is this a bug that you would be able to fix?

I have also noticed a similar problem with hyphenated words (I suspect the issues might be related so I'm putting them in a single bug report). Take the example of "user-generated content". See behaviour below

Attempt 1: search up to the hyphen but excluding it

from fast_autocomplete import AutoComplete
words = {'user-generated content': {}}
autocomplete = AutoComplete(words=words)
print(autocomplete.search(word='user'))

Result is [['user-generated content']] as expected.

Attempt 2: search one letter past the hyphen

print(autocomplete.search(word='user-g'))

Result is still [['user-generated content']] as expected.

Attempt 3: search two letters past hyphen

print(autocomplete.search(word='user-ge'))

Result is []. Everything has disappeared. Following a similar behaviour to alphanumeric characters, if I remove the hyphen from the indexed words but keep the hyphen in the search, then it works fine, see below

from fast_autocomplete import AutoComplete
words = {'user generated content': {}}
autocomplete = AutoComplete(words=words)
print(autocomplete.search(word='user-ge'))

returns [['user generated content']]

I would be grateful if you could confirm if this is a bug or a known issue / behaviour and if there are any plans to fix it. Many thanks

sataz-ehl commented 3 years ago

i have similar issue with brackets:

from fast_autocomplete import AutoComplete words = {'user content (2 off)': {}} autocomplete = AutoComplete(words=words) print(autocomplete.search(word='user content (2 o'))

returns nothing

seperman commented 3 years ago

Regarding things like dashes and brackets, you should tell autocomplete to accept those characters: https://github.com/seperman/fast-autocomplete#unicode I should update the doc so it is clear it is not just unicode and it includes special characters.

Autocomplete does put a space between alphabets and numbers by design as a part of normalization. That was because it could handle user inputs like “bmw3series” better. You could put back the original word in the context dictionary.

Im typing this on my phone. If the above doesn’t help, please let me know and I will look into it further.

Sep Dehpour

On Jul 7, 2021, at 10:42 PM, sataz-ehl @.***> wrote:

 i have similar issue with brackets:

from fast_autocomplete import AutoComplete words = {'user content (2 off)': {}} autocomplete = AutoComplete(words=words) print(autocomplete.search(word='user content (2 o'))

returns nothing

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

sataz-ehl commented 3 years ago

Hi seperman, thanks for your reply. I tried the following and the issue persists.

from fast_autocomplete import AutoComplete import string words = {'user content (2 off)': {}} valid_chars = string.ascii_letters + string.punctuation valid_chars_num = string.digits autocomplete = AutoComplete(words=words, valid_chars_for_string=valid_chars, valid_chars_for_integer=valid_chars_num) autocomplete.search(word='user content (2 o')

returns nothing

Appreciate if you could take a look. Maybe I still miss out something. Thank you! EH

AndreaSottana commented 3 years ago

@seperman I can confirm what @sataz-ehl said, this doesn't seem to make a difference to me either. Example 1:

import string
from fast_autocomplete import AutoComplete
words = {'abcd(efgh)ijk': {}}
autocomplete = AutoComplete(words=words, valid_chars_for_string=string.ascii_letters+string.punctuation)
print(autocomplete.search(word='abcd(efgh)'))

returns []

Example 2:

import string
from fast_autocomplete import AutoComplete
words = {'0123(45)': {}}
autocomplete = AutoComplete(words=words, valid_chars_for_integer=string.digits)
print(autocomplete.search(word='0123(4'))

also returns []

It would be great if you could take a look into it, just in case we're missing on something. Thanks a lot!

seperman commented 2 years ago

Sorry for the long delay. This is a bug. Let me fix it now.

seperman commented 2 years ago

This is fixed now. Fast-Autocomplete 0.9.0 is released.