jjw24 / Wox

Launcher for Windows, an alternative to Alfred and Launchy.
http://wox.one
MIT License
154 stars 12 forks source link

Fix pinyin fuzzysearch #131

Closed jjw24 closed 4 years ago

jjw24 commented 4 years ago

Problem context:

  1. Pinyin search not working as intended. #111
  2. ScoreForPinyin code is inefficiently looping FuzzySearch method

Resolution:

Result: image

image

jjw24 commented 4 years ago

@theClueless first cut, let me know your thoughts.

theClueless commented 4 years ago

I don't really understand the function idea... it just seemed over complicated (and increase the coupling between the classes) and I still don't understand how this works with what I wrote: Yeah i meant including the old code was not efficient. What happens is it just converts Chinese into pinyin (alphabet)

to my understanding this is not what it is doing. It take normal English and English chars are converted into Chinese. like shown in the issue: user types: "ditu" so the code need to search "ditu" (for normal English) and addition to that also "地图". so the current code is efficient since it does what it needs to, right? I'm not sure how you could do that differently.

Also, Is there any other language that has that odd concept? Is it really worth making this generic?

jjw24 commented 4 years ago

I don't really understand the function idea... it just seemed over complicated (and increase the coupling between the classes) and I still don't understand how this works with what I wrote:

What i am doing is essentially using the Func delegate, which defines input output and leave the bahviour to the caller to decide. Delegate can be intepreted as a placeholder for a method, and by defining the delegate, I am essentially telling the user of FuzzySearch to pass in their own custom behaviour via the deleagate should they have the need to standardise query and string to compare strings so characters can be compared, as in the case of Chinese pinyin and Chinese characters.

I wouldn't say it's over complicated, and using delegates is also good for unit testing eg. i can test fuzzysearch code and any delegate methods seperately. Quite to the contrary it actually helps with decoupling methods.

For example: image

then i can unit test the functions isMatchFoundInPrevLoop, areAllSubstringMatched behaviour seperately, rather than right now I have to test the entire FuzzySearch function.

jjw24 commented 4 years ago

to my understanding this is not what it is doing. It take normal English and English chars are converted into Chinese. like shown in the issue: user types: "ditu" so the code need to search "ditu" (for normal English) and addition to that also "地图". so the current code is efficient since it does what it needs to, right? I'm not sure how you could do that differently.

ScoreForPinyin method contains at least two calls to FuzzySearch. So essentially to search "地图" what ends up doing is FuzzySearch => score = 0 => swap query and compare string then call ScoreForPinyin => FuzzySearch calls again for matching pinyins after converting "地图" => then do FuzzySearch calls again for pinyin acronyms.

Swapping query and compare string around to run ScoreForPinyin is also a hack to get it working.

My change will convert query and stringToCompare to pinyin if either is in Chinese, then just run FuzzySearch once. Obviously it would not touch the strings if either is not in Chinese characters.

jjw24 commented 4 years ago

Also, Is there any other language that has that odd concept? Is it really worth making this generic?

Pinyin is just Chinese characters as alphabet, and one of its usages is to spell Chinese characters. So essentially when 'use Pinyin' is turned on, we just need the code to convert any strings that contains Chinese into pinyin (the code has no probem comparing Chinese characters to Chinese Characters or any other language as long as they are the same character style).

Not sure if any other language has similar concept, but making it generic will enable the code to make query and compare string the same character style via the func delegate, and leaving how to implement it to the caller.

Happy to discuss further.

theClueless commented 4 years ago

to my understanding this is not what it is doing. It take normal English and English chars are converted into Chinese. like shown in the issue: user types: "ditu" so the code need to search "ditu" (for normal English) and addition to that also "地图". so the current code is efficient since it does what it needs to, right? I'm not sure how you could do that differently.

ScoreForPinyin method contains at least two calls to FuzzySearch. So essentially to search "地图" what ends up doing is FuzzySearch => score = 0 => swap query and compare string then call ScoreForPinyin => FuzzySearch calls again for matching pinyins after converting "地图" => then do FuzzySearch calls again for pinyin acronyms.

Swapping query and compare string around to run ScoreForPinyin is also a hack to get it working.

My change will convert query and stringToCompare to pinyin if either is in Chinese, then just run FuzzySearch once. Obviously it would not touch the strings if either is not in Chinese characters.

I understand the change technically. I'm saying that in the old code it would search both (ditu and 地图) and I think it is a feature and not a bug. For example: If the only program I have is "ditur". and I type ditu, your code will convert it to 地图 and will return no match but the old code would return ditur program as a result. because it will search for both ditu and 地图. and what you did breaks that behavior.

Does that make sense?

theClueless commented 4 years ago

Also, Is there any other language that has that odd concept? Is it really worth making this generic?

Pinyin is just Chinese characters as alphabet, and one of its usages is to spell Chinese characters. So essentially when 'use Pinyin' is turned on, we just need the code to convert any strings that contains Chinese into pinyin (the code has no probem comparing Chinese characters to Chinese Characters or any other language as long as they are the same character style).

Not sure if any other language has similar concept, but making it generic will enable the code to make query and compare string the same character style via the func delegate, and leaving how to implement it to the caller.

Happy to discuss further.

And I agree with that generally, but since it is really a niece feature and everything is static you putting something out to the caller which is not straightforward (API wise it is not really user friendly). I think that if it was not static we could have done a very simple composition and have it as a service. for now I would add it as a state to the string marcher class. let me do an example for that to try and show you what I mean?

jjw24 commented 4 years ago

I'm saying that in the old code it would search both (ditu and 地图) and I think it is a feature and not a bug.

Agree.

For example: If the only program I have is "ditur". and I type ditu, your code will convert it to 地图 and will return no match but the old code would return ditur program as a result. because it will search for both ditu and 地图. and what you did breaks that behavior.

Actually my code would not convert ditu to 地图, it only converts 地图 to ditu if detected the Chinese character. That's why if you look at the screenshot in the PR description it returns both ditur and 地图.

The conversion has always been from Chinese chars to Pinyin, and not the other way.

jjw24 commented 4 years ago

for now I would add it as a state to the string marcher class. let me do an example for that to try and show you what I mean?

Sure thing

theClueless commented 4 years ago

The conversion has always been from Chinese chars to Pinyin, and not the other way.

I understood it was the other way around hence all my confusion :) So now I ask the reverse question :) If I type 地图, it will only search ditu and not 地图, which is still breaking the original behavior. right?

jjw24 commented 4 years ago

The conversion has always been from Chinese chars to Pinyin, and not the other way.

I understood it was the other way around hence all my confusion :) So now I ask the reverse question :) If I type 地图, it will only search ditu and not 地图, which is still breaking the original behavior. right?

image since it converts both query and comparestring to pinyin(as both are Chinese characters), you will still get 地图

image It will not convert query string's ditu because it is already alphabet, and it converts compare string's 地图. So you get both ditu and ditur

this is the correct behaviour we are talking about right?

theClueless commented 4 years ago

The conversion has always been from Chinese chars to Pinyin, and not the other way.

I understood it was the other way around hence all my confusion :) So now I ask the reverse question :) If I type 地图, it will only search ditu and not 地图, which is still breaking the original behavior. right?

image since it converts both query and comparestring to pinyin(as both are Chinese characters), you will still get 地图

image It will not convert query string's ditu because it is already alphabet, and it converts compare string's 地图. So you get both ditu and ditur

this is the correct behaviour we are talking about right?

you are totally right, thank you for your patience :)

theClueless commented 4 years ago

for now I would add it as a state to the string marcher class. let me do an example for that to try and show you what I mean?

Sure thing

Started something didn't finish yet, but it shows the concept I think we should go to. Branch: clueless/fix_pinyin_fuzzysearch

jjw24 commented 4 years ago

@theClueless merged your changes + updated some more

jjw24 commented 4 years ago

All good to go let's get this merged